How to use a Google Cloud AI-Powered Text-to-Speech REST Service?

Sandeep Pamidamarri
3 min readFeb 14, 2021

--

In this particular post, we will learn how to use the Google Cloud Text-to-Speech REST service. It converts text into natural-sounding speech using an API powered by Google’s AI technologies. It is useful in the Web/Mobile applications to provide customer interactions with lifelike human intonation. This service is also helpful in converting the PDF’s into the audiobooks.

The service uses the SSML — Speech Synthesis Markup language. The output of the REST API service is the Synthesised audio in base-64 encoded format. You can decode that content to an mp3 file.

Pre-requisites

  1. Set-up google cloud account

Step 1: Enable Cloud Text-to-Speech API

Enable the Cloud Text-to-Speech API in the google developer console.

Step 2: Create a Google Service Account to access API

In the google cloud console, search for IAM & Admin and click on the service accounts. Create a Google Service Account with No Service Role. No role is required to access this service.

Generate the Key.json for the created google service account

Step 3: Access the Google Cloud shell to invoke Text-to-Speech API

In the top-right corner, click in the cloud shell. Cloud shell is a google provided Linux playground environment.

Access the cloud shell. Upload the generated key.json.

Export the GOOGLE_APPLICATION_CREDENTIALS as shown below.

export GOOGLE_APPLICATION_CREDENTIALS='key.json'

Step 4: Test the Google Text-to-Speech REST Service in Cloud Shell

Create the request.json file as follows. In the following example, you are converting the source text to the audio file.

{"input":{"text":"Google cloud has many AI powered API services. This services helps organisations to provide better customer experience."},"voice":{"languageCode":"en-gb","name":"en-GB-Standard-A","ssmlGender":"FEMALE"},"audioConfig":{"audioEncoding":"MP3"}}

REST API Endpoint

POST https://texttospeech.googleapis.com/v1/text:synthesize

Use the curl command to hit the Text-to-Speech API endpoint

$ curl -X POST -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" -d @request.json POST https://texttospeech.googleapis.com/v1/text:synthesize > response.json

The output is the audio content in the base-64 encoded format

Step 5: Decode the encoded content to an audio file

Copy the contents of the audioContent the field into a new file named synthesize-output-base64.txt

Decode the text file to an mp3

base64 synthesize-output-base64.txt --decode > demo-audio.mp3

Output file:

Hurray, Congratulations — Now you successfully learned how to create and use the Google Cloud Text-to-Speech API service.

--

--

Sandeep Pamidamarri
Sandeep Pamidamarri

Written by Sandeep Pamidamarri

Digital Transformation Leader | Pega Lead Solution Architect | Pega Certified Data Scientist | Pega Customer Service | Pega Sales Automation | AWS Cloud