Text to Speech with the 3CX Call Flow Designer
On this topic
Many times we need to reproduce audio that we can’t have pre-recorded. It could be a name, a place, some task description that we get from a database, just to name a few. In these cases Text to Speech comes to the rescue, letting us create wav files on the fly, so the CFD app can play them back to the caller.
The 3CX Call Flow Designer now includes a new type of prompt: Text to Speech Audio Prompt. You can select it in any place where you’re configuring prompts, like the Prompt Playback component, the Menu component, the User Input component, etc.
In order to use TTS, you need an Amazon Web Services account. 3CX uses Amazon Polly for TTS, as after considering different engines, we found Amazon Polly to be extremely good in quality, having a very good language coverage, many voices, at a very affordable price, also including a totally free tier during the first year of use. Also, you need 3CX Phone System 15.5 SP2 to be able to use this feature.
The CFD app converts text to speech in real time, just before playing the message to the caller. It invokes a web service to get the audio stream, and saves it to a local wav file. Finally, when the call ends, the wav files are removed, always keeping the installation clean.
In this article we’ll explain how to create the Amazon Web Services account, how to enable Amazon Polly, and how to use it with the new Text to Speech Audio Prompt to play a dynamically generated audio stream.
The project for this application is installed along with the 3CX Call Flow Designer, in folder “Documents3CX Call Flow Designer Demos”.
Step 1: Create an Amazon Web Services (AWS) account
Before we start working in our CFD project, we need an Amazon Web Services account. In order to create it, please follow this guide from Amazon.
Step 2: Create an Identity and Access Management (IAM) user
Once we have our AWS account, we need to create an IAM user. The CFD application will use this user’s credentials to access the Amazon Web Services. Please follow this guide from Amazon to do the job. When asked, set the access type to “Programmatic access”. When configuring permissions, select “Attach existing policies to user directly”, search for “AmazonPollyFullAccess” and check it.
After creating the IAM user, go to the user’s settings, click on Security credentials, and then click on “Create access key”. Take note of the “Access key ID” and the “Secret access key”. This information will be required when configuring your CFD project to use TTS.
Step 3: Consider Amazon Polly limitations
Please be aware of the following limits when using Amazon Polly:
These limitations should not cause any problem to almost any CFD project, but please keep them in mind.
Step 4: Create the project
Now that we have our Amazon Web Services account ready to work with Amazon Polly, we can go ahead and create our Call Flow Designer project. Open the CFD and go to “File > New > Project”, select the folder where you want to save it, and enter a name for the project. In this case we’ll name it “TextToSpeechDemo”.
Now go to the Project Explorer window and select the project node. Then, the Properties window will show the new settings we need to fill for TTS to work:
- AmazonClientID: this is the “Access key ID” that we generated in Step 2.
- AmazonClientSecret: this is the “Secret access key” that we generated in Step 2.
- AmazonRegion: here we need to select the closest region to our location, to reduce latency. Available regions for Amazon Polly are listed here.
The settings entered here will be used for every Text To Speech Audio Prompt in this project.
Step 5: Add a “Prompt Playback” component
Usually we will use TTS to dynamically generate audio from data retrieved from data sources, like a database, or a web service. But in this case, for the sake of simplicity, we’ll be creating the text to convert to speech concatenating static text and a callflow variable. So we will define a callflow variable named “AccountBalance” and set the value to 100, and then we will be able to play a message like: “Your account balance is $100”.
In order to add the “Prompt Playback” component:
- Drag a “Prompt Playback” component from the toolbox, and drop it into the design surface of the “Main” callflow. Then select the component added, go to the Properties window and rename it to “playPrompt”.
- From the Properties window, open the Prompt Collection Editor, pressing the button on the right of the “Prompts” property.
- Press “Add” to add a new prompt to the collection, and change the type to “Text to Speech Audio Prompt”.
- Select the Voice to use. The drop down list of voices is ordered by language, so you can easily find the options available for the language you need to use. The voices available for Amazon Polly are listed here. In case of Amazon releasing a new voice not included yet in this drop down list, you can just enter the value from the “Name/ID” column in order to use it. If you want a specific voice to be pre-filled, you can select it in the Options configuration dialog, from Tools > Options > Component Templates > Text To Speech. For this demo we’ll use “Joanna (English - US, Female)”.
- Select the Type of the text. There are 2 options here: Text and SSML (Speech Synthesis Markup Language). You will usually want to use Text. When you select the type Text, the value of the following property Text is considered as plain text, and the TTS engine will try to convert it to speech just as it is. On the other hand, if you select the type SSML, the value of the following property Text is considered XML according to the SSML specification. With SSML you can control various aspects of speech such as pronunciation, volume, pitch, and speech rate. For more information, see Using SSML. For this demo we’ll use “Text”.
- Enter an expression for the Text. Depending on the type selected in the previous step, the expression must return plain text to convert to speech, or XML according to the SSML specification. For this demo we’ll use the following expression:
CONCATENATE("Your account balance is $",callflow$.AccountBalance)
Step 6: Build and Deploy to 3CX Phone System
The project is ready. We just need to build and upload it to our 3CX Phone System server. To do this:
- Go to “Build > Build All”. The CFD will create the file “TextToSpeechDemo.tcxvoiceapp”.
- Go to the “3CX Management Console > Call Queues”, create a new queue, configure it with name and extension, check the “Voice apps” option, and upload the file created by the CFD in the previous step.
- Save the changes to the queue. The voice app is ready to use. Make a call to the configured extension to test this app. Please note that the very first time you call to this application, the text to speech conversion might have a delay of a few seconds. This is related to the authentication procedure, and only happens the first time you call to the app.
Usually a project requires some static prompts, for example to welcome your users or offer an options menu, and some variable prompts, like playing the caller’s account balance. You will probably want to use the TTS service for variable prompts only, to avoid overpaying to convert always the same text to speech. But also, you will want to have the same voice for all your prompts. So it is recommended that you create wav files for the static prompts, using the Amazon Polly console, download them as wav files to your project, and use these files as a standard Audio File Prompt, instead of converting them from text to speech for every call.
So, from the Amazon Polly console, select your language and region, then the voice you will use, enter the text of your prompt and press “Download MP3”. Please note that 3CX requires files to be WAV, Mono, 8.000Hz, 16 bits per sample, as described here. So after downloading the MP3 files, you will need to follow the steps described in the previous article to convert the files to the proper format.