Convert Text to Speech Using Web Speech API in JavaScript | by Gourav Kajal | Apr, 2022

Let’s listen to those words!

Photo by Daiga Ellaby on Unsplash

I spent a lot of time on Medium. Sometimes to write about something, but mostly to read. Read about the experiences that other developers are willing to share with the community.

Recently, I noticed that the Medium has a play button for each and every story. Initially, I thought this privilege is given to only a few stories or writers. But at the next moment, I knew that this is for all the readers. This means we can even listen to the stories on Medium. Awesome!

Then, just like a typical developer, I thought, how did they do that? I knew that in the JavaScript realm, we have a web API called Web Speech API for that, but I never used or learned about that.

So today, let’s learn about this web API together, and let’s even create a working example as well.

Voice data is incorporated into online apps using the Web Speech API. In this article, we’ll create a simple webpage that implements text-to-speech using the Web Speech API.

For the sake of this demo, let’s create a new directory and create two new files: index.html and text-to-speech.js

In the HTML file, let’s elements set up the following:

  • A select menu with no options. Using JavaScript, we’ll fill the empty select menu with a list of possible voices
  • Range sliders for volume, pitch, and rate
  • A textarea to type in
  • Control buttons for the speech

In this demo, we are going to use Bootstrap 5 for the styling. Here’s some code:

This is how it will look in the browser:

Output

In the JavaScript file, we are mainly going to use three interfaces: SpeechSynthesis , window.speechSynthesis and SpeechSynthesisUtterance. So, let’s understand them briefly.

JavaScript SpeechSynthesis Interface

This is the principal interface for the speech synthesis service, which controls the synthesis or production of speech based on the text input. This interface is used to start, stop, pause, and restart speech, as well as to access the device’s supported voices.

The methods provided in this Interface are as follows:

  • speak(): To add the utterance(object of SpeechSynthesisUtterance) in the queue, which will be spoken when there is no pending utterance before it, this is the function, we will be using to
  • pause(): To pause the current ongoing speech
  • resume(): To resume the paused speech
  • cancel(): To cancel all the pending utterances or speech created, which are not yet played
  • getVoices(): To get the list of all supported voices which the device supports

JavaScript window.speechSynthesis Property

The speak() method is called on the voice synthesis controller interface, which is referenced by this property of the JavaScript window object.

We will understand this more when we jump into the code.

JavaScript SpeechSynthesisUtterance Interface

This is the interface where we really produce the speech or utterance from the text provided, including language type, volume, voice pitch, rate of speech, and so on. After creating an object for this interface, we provide it to the speak() method of the SpeechSynthesis object to play the speech.

There are six properties on the SpeechSynthesisUtterance interface that we can tweak. They are as follows:

Language:

The language property obtains and sets the utterance’s language. If unset, the <html lang=”en”> lang value will be used, or the user-agent default if the <html lang=”en”> lang is not available.

speech.lang = "en";

Text:

When the utterance is spoken, the text property obtains and sets the text that will be synthesized. The text can be sent in plain text format. The text property must be set when the start button is pressed in our example.

Let’s give the button a click listener. We should retrieve the text value from the textarea and set it to this property when the button is clicked.

document.querySelector("#start").addEventListener("click", () => {
speech.text = document.querySelector("textarea").value;
});

Volume:

The volume property obtains and sets the utterance’s volume. It’s a float that indicates the volume value, which ranges from 0 (lowest) to 1 (highest). If this property is not set, the default value is 1.

Add an input listener to the volume range slider and alter the volume property when the slider value changes. The slider’s min, max, and default values ​​have already been specified in the HTML tag.

Next to the range slider, we’ll add a <span> that displays the volume’s value on the webpage.

document.querySelector("#volume").addEventListener("input", () => {
// Get volume value from the input
const volume = document.querySelector("#volume").value;

// Set volume property of the SpeechSynthesisUtterance instance
speech.volume = volume;

// Update the volume label
document.querySelector("#volume-label").innerHTML = volume;
});

Rate:

The rate property returns and sets the utterance’s rate. It’s a float that represents the rate value, which can range from 0.1 (lowest) to 10 (highest). If this property is not set, the default value is 1.

Let’s do the same thing for ratewhich we had done for volume.

document.querySelector("#rate").addEventListener("input", () => {
// Get rate value from the input
const rate = document.querySelector("#rate").value;

// Set rate property of the SpeechSynthesisUtterance instance
speech.rate = rate;

// Update the rate label
document.querySelector("#rate-label").innerHTML = rate;
});

Pitch:

The pitch property returns and sets the utterance’s pitch. Again, it’s a float value where 0 means lowest and 1 means highest.

Let’s do the same thing for pitchwhich we had done for rate and volume.

document.querySelector("#pitch").addEventListener("input", () => {
// Get pitch Value from the input
const pitch = document.querySelector("#pitch").value;

// Set pitch property of the SpeechSynthesisUtterance instance
speech.pitch = pitch;

// Update the pitch label
document.querySelector("#pitch-label").innerHTML = pitch;
});

Voice:

The voice property retrieves and modifies the voice that will be used to deliver the speech. One of the SpeechSynthesisVoice objects should be used. If it isn’t configured, the most appropriate default voice for the language setting of the utterance will be utilized.

We need to retrieve the list of available voices in the window object to set the voice of the utterance. The voices will not be available right away when the window object loads. It’s an async operation. When the voices are loaded, an event will be triggered. We can set a function that should be executed when the voices are loaded.

window.speechSynthesis.onvoiceschanged = () => {
// On Voices Loaded
};

Using window.speechSynthesis.getVoices(), we can retrieve a list of voices. It will return an array of accessible SpeechSynthesisVoice objects. Let’s save the list in a global array and use it to update the web page’s select menu with the available voices.

Now that the voice menu has been modified, we can add an change event listener to it to update the voice of the SpeechSynthesisUtterance instance. We’ll use the index number (which is set as the value for each option) and the global array of voices to update the voice when a user updates it.

document.querySelector("#voices").addEventListener("change", () => {
speech.voice = voices[document.querySelector("#voices").value];
});

Controls

If you remember, in our index.html we have a few control buttons like start, resume, pause, and cancel. Let’s make them work by using the SpeechSynthesis interface and its methods.

Start:

The SpeechSynthesisUtterance instance should be passed to the window.speechSynthesis.speak() when the start button is pressed. This will begin the process of transforming the text into speech.

Before calling this function, the text property must be set.

If you start another text-to-speech instance when one is already running, the new one will be queued behind the current one.

document.querySelector("#start").addEventListener("click", () => {
speech.text = document.querySelector("textarea").value;
window.speechSynthesis.speak(speech);
});

Pause:

To pause the currently running instance of SpeechSynthesisUtterancewe can use window.speechSynthesis.pause() .

document.querySelector("#pause").addEventListener("click", () => {
window.speechSynthesis.pause();
});

Resume:

To resume the currently paused instance of SpeechSynthesisUtterancewe can use window.speechSynthesis.resume() .

document.querySelector("#resume").addEventListener("click", () => {
window.speechSynthesis.resume();
});

Cancel:

We can cancel the SpeechSynthesisUtterance instance that’s running at the moment using window.speechSynthesis.cancel().

document.querySelector("#cancel").addEventListener("click", () => {
window.speechSynthesis.cancel();
});

Now, we are done with all the controls and we already set up the required properties. So, here’s the final version of text-to-speech.js:

And here’s the final output on the browser screen.

Final Output

Now, simply enter some text in textarea and click on the Start button and listen to the words which you have just written.

Leave a Comment