Google has been able to prove its prowess with AI and Machine learning time and again with its industry-leading range of products including email suggestions and computational photography. The app store is inundated with several recording apps but when Google brings this, it has to be a sure shot winner! 

As you can see, amongst the biggies Google fills the top slot with at least one new product per year. 2014 was Google’s best with 5 new concepts coming from the research unit of Google. It has a close competitor in Apple it still has to cover miles before it reaches there. 

The recently launched Google voice Recorder app is powered by a Machine Learning algorithm that deciphers the audio with a splendid accuracy rate. The new ML superstar product is a successor to Google’s several attempts to infuse different components of Artificial intelligence in their offerings. Some attempts did not really take off well as the Google Clips, and some like Google’s Pixel Phone camera app accomplished colossal success. This flagship phone of Google managed to emerge triumphantly and had featured as one of the best smartphone cameras available in the market. It used machine learning algorithms for picture processing after the click. Google voice Recorder is yet another attempt by the mighty Google to raise its benchmark using Machine Learning and AI. 

Let’s dig deeper and see its main functionalities and how they go about constructing it.

But, before that, let us understand a few things about Machine Learning. 

What is Machine Learning?

Learning and adapting are the two pillars of Machine Learning. It is a concept wherein a computer program learns and adapts by itself without the need for human interference. It is one of the components of Artificial Intelligence that deals with a huge amount of data only to be deciphered with a sense of its own based on the algorithms given. 

Many different industries and sectors have huge data at their disposal. The enormous amount of data so available is also called Big Data in technological parlance. With the progressive use of technology, Big Data now is easily accessible and helps these sectors to derive huge benefits out of analyzing such data to their advantage but many of them lack the requisite resources to do so. Artificial intelligence is widely used in order to garner the data, process it further and interpret it. Based on such interpretation, useful information is shared with the relevant department which further takes action. One of the increasingly used methods to process the data is machine learning. 

Let’s also see some facts and forecasts to shed light on the importance of Machine Learning.

  • The global Machine Learning market is estimated to reach a whopping $20.83B in 2024 from $1.58B in 2017. The CAGR so registered is expected to shoot up high at 44.06% between the time period 2017 and 2024.
  • The AI software revenues generated will shift north from $10.1B in 2018 to $126B by 2025 as per the prediction made by Tractica.

Source- MRFR Analysis

Well, that is going to be huge!  There are other researches also that have proved that Machine Learning has created big footprints and is stated to go bigger and better.

What exactly is the Google voice recorder?

Google voice Recorder is a real-time transcription device that records audios, transcribes it and then converts into an editable text decipherable by one and all. It also works offline. 

You would then ask yourself, “How is it different from other voice recording apps?”

We say there is a difference, and it is this difference that makes it a winner!

Firstly none of the apps can function offline and the Google voice recorder has this option, and secondly, even if they do transcribe like the Google  voice Recorder they can’t do it without the internet!  In fact, you do not even have to give a separate command asking the app to transcribe. Even that is automatic!

Google voice recorder is a dazzling tool for students and journalists who will find it beneficial to record lectures and interviews. Initially, it was revealed purely for Pixel 4 but it is now been tested on other Android devices too. 

6 things you must know about the Google voice recorder

Embracing the Edge-first model design:

We all are aware that the Mobile-first design is adopted by companies to develop their application for an enriched mobile experience after which they will release it for the desktop version. For AI-based, the same thought process can be applied which is why the Edge-first model design came into being. In all normality, we know that machine learning-based apps run on cloud platforms. But, if the company is looking at leveraging machine learning to create powerful apps with high longevity then it has to think beyond the cloud structure. Cloud-based apps make the app slower and also has serious problems when it comes to user privacy. 

Google voice recorder uses the RNN transducer model which can be housed in the phone which is the reason behind its robust transcription framework. It does not follow the conventional pipeline approach and makes use of a single neural network which is more popular for decoding issues. 

Image Source- Wishdesk

How is RNN-T been used:

Image Source: AnalyticsMagIndia

The illustration given above catches the exact principle of the working of RNN-T. With the help of the feedback loop, the RNN-T recognizer outputs characters one after the other which consumes symbols analyzed in advance by the model back into it to anticipate which symbol is to follow. 

Technology stack directly proportional to performance:

The creation of the app has been done using the open-source Swift language with TensorFlow. A deadly combination indeed because it provides the dual benefit of a faster development time along with superior performance. Both Swift and TensorFlow have great potential for ML-based applications especially when ML is diverting towards commercially applicable applications. 

Transcribing or Interpretation:

The app generates real-time transcriptions of audio recordings. The transcribed text is also scan be easily found thereby allowing you to quickly find a specific word in a conversation without listening to the whole recording.

The feature that will outshine all other voice recording apps is the transcribing functionality of the Google voice recorder. The real-time transcribed text is searchable and helps the user to swiftly search a particular work in the entire interaction without the need to hear the entire conversation. 

The on-device speech recognition model used by Google allows the app to transcribe protracted audios files up to a few hours. The words so recorded are charted to the timeline of the recording. When the user taps on a particular word in the transcription piece, the audio will start playing from that point onwards. 

Apprehend the sounds:

Google has further enumerated that it has used the convolutional neural network to connect diverse sound patterns to different colors. This is pretty similar to the same model that Google uses for Android 10’s Live caption. The On-device Machine Learning models helps the users to perceive different sounds like an animal shrill or a music piece played on an instrument. Depending on the intensity of the sound, it will assign a color in the audio waveform. With this, it becomes easier for users to understand sounds by merely looking at it. 

The Google voice Recorder also monitors different sound profiles like speech etc. This it initiates every 50 milliseconds in a 960 millisecond time interval. This will help users to exact pint down the start and the end time so that errors are few. Google voice Recorder also has a sliding window that handles moderately overlying 960ms audio frames at 50ms time frames and gives a sigmoid scores vector as an output. Google voice recorder applies a definite process on the sigmoid scores in addition to an at the brink structure to heighten the accuracy of the system and also report the right segregation of the sound. 

Recommending titles and tags:

When your recording ends, the app also goes further by suggesting you add tags and titles to the audio and text pieces. It also counts term incidences with their grammatical connotation. Those terms that are separated as entities are also capitalized. The model algorithm enables it to also tag parts of speech. It then goes through a language testing score and provides ranking and scores for the quality of the content. The finally selected list of words is what would be the title or tag selections which will be displayed. 

Look at the image below and see it for yourself!

Clearly, Google has left no stone unturned to make this smart application smarter and smarter with its in-built mechanism. 

Protecting User Privacy:

Not just for Google but even we as users are concerned about privacy. Google voice recorder which is the cutting-edge machine learning application by Google had decided to tackle this issue hands-on. AI and Machine Learning are all about deducing data, which means that enormous amounts of user data including personal data are at the disposal. If you have recorded audio that is a clandestine discussion about a crucial case with the lawyer, it can be transferred to the cloud because on this application privacy has largely remained a hindrance on cloud-based platforms.

Applying machine learning, the new Google voice recording app will provide you the necessary security cover because it has been built on the edge platform. Additionally being able to access it offline is also one of the reasons why user privacy is protected. There is absolutely no need to transfer the data onto the cloud. The users can heave a sigh of relief because their privacy is protected. 

Final words

Research is headed in the right direction and with ML-based applications like the Google voice recorder being presented on the edge platform, we are rest assured that privacy and security concerns are annulled. Without a doubt, speech is one of the most important forms of communication and with this unique offering from the house of Google and we have just seen how ML can be embedded in as simple a thing as a voice recorder. 

It is high time we see Machine learning and AI as a tool as that which helps humans to evolve and race ahead rather than seeing it from the glasses of AI v/s Human race!

“Artificial intelligence will reach human levels by around 2029. Follow that out further to, say, 2045, we will have multiplied the intelligence, the human biological machine intelligence of our civilization a billion-fold.” ~Ray Kurzweil

Share: