After living in the West for almost ten years, I returned to my home country about a couple of years ago and got stuck here due to the pandemic. After all the lockdowns and the reopening of everything, I started to mingle with my relatives and friends. We exchanged social media handles, phone numbers, WhatsApp numbers, etc., to stay in touch in the virtual world.
I realised after communicating online with all of them that almost every single person in my contact list preferred sending audio and voice messages to each other rather than typing them. When I was among them physically in gatherings, I realised that when they search something online either via Google or YouTube, etc., they prefer using voice commands rather than typing. I was the only one who preferred typing.
Even though I sort of disagree with overusing voice commands all the time and would want users to choose typing because this approach made me worried we might lose typing language skills at some point in the future just like most of Gen X have lost the skills of writing already.
I presume Gen Y (if we name them accordingly in the future) will most likely become a writeless and typeless generation.
On the other hand, voice commands have certain benefits too such as it is less time-consuming in this fast-paced world. Secondly, it is more multitasking friendly even for men like me (who are considered to be less multitasker). We can prepare meals while talking to Siri, Google Home, Alexa, etc., and get certain jobs done. It also gains users’ extra loyalty to help them cope with routine and finally, its usage is more powerful for user behavior analytics.
The usage of voice commands trend is going in an upwards direction according to Adobe Digital Insights Survey.
It doesn’t matter whether I am in favor or against this new trend. The fact is that I am a software engineer and I have to adapt to any technology that is used more widely and helpful both for consumers and vendors. Having said that, I will use and implement voice recognition technology in web apps when there are needs for it and there is a need for it in today’s world.
In this article’s context, I will put more of my focus on the Speech Recognition part of this API. It receives speech from a user through the device’s microphone. And we can do certain operations on this such as checking them against a list of grammar until they are finally returned as a result in the form of string.
git clone https://github.com/zafar-saleem/hut.git
Then cd into that hut folder and run the yarn command to install all the dependencies.
Now paste the below code inside the
It is a simple HTML file that is self-explanatory. The most important part is inside
div#search_container. I am creating a label with Search text, and inside it, there is a text field. Then I have two buttons: one to start the speech and the second to stop recording.
Now that we are done with the HTML part, let’s move into the CSS part. Paste the following CSS into the
That is real simple CSS. The entire HTML will have a common font family. Then there are two utility CSS classes: one is
.show to show one of the buttons and the second one is
.hide to hide the other button.
In this file, the first thing I am going to do is import the
index.css file. Then I am going to create a new instance of
webkitSpeechRecognition, which is available at the global scope, ie, window. Then I am going to cache the start and stop buttons into their respective variables.
Then I am going to add a click event to
startButton. Inside its callback function, I am going to check if the
startButton has a
show CSS class. If so, it means it is currently set to the visible status and users can see the start button. This also means that it is the indication that users can press this button to start speaking.
Since this condition is true, I hide
startButton and show
stopButtonand then I call the
startRecording function, which is declared later in the code. Basically, this starts the Web Speech API by calling the
.start() function on recognition instance.
Next, I am going to add the click event listener to
stopButton and check if it has the
show CSS class. If it has then it means users are currently seeing the stop button, which also means that Web Speech API is currently active and users can speak to record their voice.
Since this condition is true, I simply hide
stopButton and show
startButton and then I call the
stopRecording function which stops the Web Speech API by calling the
.stop() function on recognition instance.
Then I already explained I wrote the
Next, I am using the
onresult function on the Web Speech API recognition instance. This gets called when the speech is stopped, and we got the end result from the users’ speech. Inside this function, I am declaring a local variable,
saidText. Then I loop through the string and append it to the
saidText variable, which I eventually render on the
speechText input element.
At the end of the file, I am going to use the
onend function on Web Speech API’s recognition instance. This gets called when we stop speaking. Inside this function, I hide the stop button and show the start button.
Now, inside your terminal, run the following command to run this project:
Then go to
http://localhost:8080 in your Chrome browser. You should be able to record your speech after pressing the start button. And once you stop speaking, it will automatically turn it into text and render it inside the input text field.
Web Speech API using React
npx create-react-app voice-speech-react
Give it a few minutes. Once it runs successfully, it will create a react project in your machine’s file system. cd into the
voice-speech-react folder. Open this folder in your favorite editor.
First, let’s add a minimal CSS inside the
src/App.css file. Paste the following code in this file:
Now, let’s get into the fun part, which is React. Paste the following code inside your
In this file, first and foremost, I am using the
useState hook from React and importing
App.css for styles to affect the page.
Then I have a functional component named App. Inside that, I am creating a local state variable recognition with initial value as a new instance of the webkitSpeechRecognition Web Speech API. Next, I am declaring a boolean state variable that I will use later to show and hide both buttons.
Next, I have a
saidText local state variable.
Then I wrote a function called
startRecording in which I update the state of
isShow local state variable and then call the
recognition.start() function to trigger starting speech.
stopRecording function is next. Again, I update the state of the
isShow local state variable and then call the
recognition.stop() function to stop recording the speech from users.
Again, I am using the
onresult property on Web Speech API instance, ie, recognition. Inside this function, I declare a
spokenText local variable. Then I loop through the speech result and append it to the
spokenText variable and then update the local state variable, ie,
Next is the
onend property on speech recognition instance, which is called when users stop speaking to the mic. Inside this function, I am going to update the
isShow local state variable and then stop the recording.
Then, I am returning JSX. Inside here, I have a text input field with a
saidText. Then, I have two Start and Stop buttons, and they are shown and hidden based on the value of the
isShow state variable. They call their respective functions, ie,
That’s all about the React version of this article. Now when you run this project by using the following command:
You should be able to record your speech when you press the start button and when you stop speaking, whatever you spoke will be rendered inside the input field as a text.
You can find the code for this section in the GitHub Repository.
That is it for today’s article.
Want to Connect?Linkedin | Github | Gitlab | Instagram | Website