Adding the Voice Search Feature to Your Web Apps | by Zafar Saleem | Mar, 2022

Implement the voice search feature using vanilla JavaScript

Male on phone

After living in the West for almost ten years, I returned to my home country about a couple of years ago and got stuck here due to the pandemic. After all the lockdowns and the reopening of everything, I started to mingle with my relatives and friends. We exchanged social media handles, phone numbers, WhatsApp numbers, etc., to stay in touch in the virtual world.

I realised after communicating online with all of them that almost every single person in my contact list preferred sending audio and voice messages to each other rather than typing them. When I was among them physically in gatherings, I realised that when they search something online either via Google or YouTube, etc., they prefer using voice commands rather than typing. I was the only one who preferred typing.

Even though I sort of disagree with overusing voice commands all the time and would want users to choose typing because this approach made me worried we might lose typing language skills at some point in the future just like most of Gen X have lost the skills of writing already.

I presume Gen Y (if we name them accordingly in the future) will most likely become a writeless and typeless generation.

On the other hand, voice commands have certain benefits too such as it is less time-consuming in this fast-paced world. Secondly, it is more multitasking friendly even for men like me (who are considered to be less multitasker). We can prepare meals while talking to Siri, Google Home, Alexa, etc., and get certain jobs done. It also gains users’ extra loyalty to help them cope with routine and finally, its usage is more powerful for user behavior analytics.

The usage of voice commands trend is going in an upwards direction according to Adobe Digital Insights Survey.

Household penetration of smart speakers in the US 2015–2020.  1.2 million, 3.2 million, 7.4 million, 12.7 million, 17.6 million, 21.4 million, respectively

It doesn’t matter whether I am in favor or against this new trend. The fact is that I am a software engineer and I have to adapt to any technology that is used more widely and helpful both for consumers and vendors. Having said that, I will use and implement voice recognition technology in web apps when there are needs for it and there is a need for it in today’s world.

You got it right. Today, I am going to show you how to implement a minimal voice-based search system using vanilla JavaScript, and then I will convert that into a React project. First thing first, let me explain a bit about Web Speech API in JavaScript.

In this article’s context, I will put more of my focus on the Speech Recognition part of this API. It receives speech from a user through the device’s microphone. And we can do certain operations on this such as checking them against a list of grammar until they are finally returned as a result in the form of string.

Let’s get into the implementation. First, I will implement a voice search feature using vanilla JavaScript.

Web Speech API using vanilla JavaScript

For this version, I used my own vanilla JavaScript boilerplate code available on my GitHub profile here. Please go ahead and clone that repo with the following command:

git clone https://github.com/zafar-saleem/hut.git

Then cd into that hut folder and run the yarn command to install all the dependencies.

Now paste the below code inside the src/index.html file:

It is a simple HTML file that is self-explanatory. The most important part is inside div#search_container. I am creating a label with Search text, and inside it, there is a text field. Then I have two buttons: one to start the speech and the second to stop recording.

Now that we are done with the HTML part, let’s move into the CSS part. Paste the following CSS into the src/index.css file.

That is real simple CSS. The entire HTML will have a common font family. Then there are two utility CSS classes: one is .show to show one of the buttons and the second one is .hide to hide the other button.

Time for the JavaScript. Paste the following JavaScript code inside the src/index.js file.

In this file, the first thing I am going to do is import the index.css file. Then I am going to create a new instance of webkitSpeechRecognition, which is available at the global scope, ie, window. Then I am going to cache the start and stop buttons into their respective variables.

Then I am going to add a click event to startButton. Inside its callback function, I am going to check if the startButton has a show CSS class. If so, it means it is currently set to the visible status and users can see the start button. This also means that it is the indication that users can press this button to start speaking.

Since this condition is true, I hide startButton and show stopButtonand then I call the startRecording function, which is declared later in the code. Basically, this starts the Web Speech API by calling the .start() function on recognition instance.

Next, I am going to add the click event listener to stopButton and check if it has the show CSS class. If it has then it means users are currently seeing the stop button, which also means that Web Speech API is currently active and users can speak to record their voice.

Since this condition is true, I simply hide stopButton and show startButton and then I call the stopRecording function which stops the Web Speech API by calling the .stop() function on recognition instance.

Then I already explained I wrote the startRecording and stopRecording functions.

Next, I am using the onresult function on the Web Speech API recognition instance. This gets called when the speech is stopped, and we got the end result from the users’ speech. Inside this function, I am declaring a local variable, saidText. Then I loop through the string and append it to the saidText variable, which I eventually render on the speechText input element.

At the end of the file, I am going to use the onend function on Web Speech API’s recognition instance. This gets called when we stop speaking. Inside this function, I hide the stop button and show the start button.

Now, inside your terminal, run the following command to run this project:

yarn serve

Then go to http://localhost:8080 in your Chrome browser. You should be able to record your speech after pressing the start button. And once you stop speaking, it will automatically turn it into text and render it inside the input text field.

That is how we can implement a voice search feature using vanilla JavaScript. You can find the entire code using the following link:

https://github.com/zafar-saleem/voice-search

Web Speech API using React

Now that we are done with vanilla JavaScript, let’s reimplement the above feature using a modern library of our choice, which is React in this case. Let’s get started. Run the following command to create a React project:

npx create-react-app voice-speech-react

Give it a few minutes. Once it runs successfully, it will create a react project in your machine’s file system. cd into the voice-speech-react folder. Open this folder in your favorite editor.

First, let’s add a minimal CSS inside the src/App.css file. Paste the following code in this file:

The CSS part is exactly the same as in the vanilla JavaScript section. All we need is two utility CSS classes to show and hide buttons.

Now, let’s get into the fun part, which is React. Paste the following code inside your src/App.js file:

In this file, first and foremost, I am using the useState hook from React and importing App.css for styles to affect the page.

Then I have a functional component named App. Inside that, I am creating a local state variable recognition with initial value as a new instance of the webkitSpeechRecognition Web Speech API. Next, I am declaring a boolean state variable that I will use later to show and hide both buttons.

Next, I have a saidText local state variable.

Then I wrote a function called startRecording in which I update the state of isShow local state variable and then call the recognition.start() function to trigger starting speech.

The stopRecording function is next. Again, I update the state of the isShow local state variable and then call the recognition.stop() function to stop recording the speech from users.

Again, I am using the onresult property on Web Speech API instance, ie, recognition. Inside this function, I declare a spokenText local variable. Then I loop through the speech result and append it to the spokenText variable and then update the local state variable, ie, saidText.

Next is the onend property on speech recognition instance, which is called when users stop speaking to the mic. Inside this function, I am going to update the isShow local state variable and then stop the recording.

Then, I am returning JSX. Inside here, I have a text input field with a defaultValue as saidText. Then, I have two Start and Stop buttons, and they are shown and hidden based on the value of the isShow state variable. They call their respective functions, ie, startRecording and stopRecordingrespectively.

That’s all about the React version of this article. Now when you run this project by using the following command:

yarn start

You should be able to record your speech when you press the start button and when you stop speaking, whatever you spoke will be rendered inside the input field as a text.

You can find the code for this section in the GitHub Repository.

That is it for today’s article.

Want to Connect?Linkedin | Github | Gitlab | Instagram | Website

Leave a Comment