TTS voice quality

Discussion in 'CRM / Helpdesk / App Integration' started by njfrost, Jun 5, 2015.

Thread Status:
Not open for further replies.
  1. njfrost

    Joined:
    Aug 13, 2012
    Messages:
    22
    Likes Received:
    0
    I have a VAD application working well, but my client is not satisfied with the voice quality.
    We are using an Ivona voice which seems to be one of the better ones. The speech is very realistic, but there is a great deal of sibilance in the playback.
    Has anyone come across this issue, or can recommend an alternative voice.
    Thanks
     
  2. VAD_Support

    VAD_Support Active Member

    Joined:
    Aug 6, 2009
    Messages:
    690
    Likes Received:
    0
    Hi there!

    There are many voices out there to try. For example, the voices from Cepstral or Loquendo:
    http://www.cepstral.com/
    http://www.nuance.com/for-business/by-solution/customer-service-solutions/solutions-services/inbound-solutions/loquendo-small-business-bundle/tts-demo/english/index.htm

    But you need to try them yourself and feel confortable with the one you choose.

    Kind regards.
     
  3. njfrost

    Joined:
    Aug 13, 2012
    Messages:
    22
    Likes Received:
    0
    Thanks for the reply. I am trying out alternative voices.
    But the issue I have is that the voice quality via 3CX TTS seems to be inferior to that of the voice when played through IVONA's own reader app. Is there any specification for the SAPI voice required by 3CX? (32/64-bit?)

    I had this comment from Ivona - "SAPI Voices are 22khz and 3CX phone should probable use 8khz so there is a downscale of the voice causing the quality issues."
    What is your opinion of this?
     
  4. VAD_Support

    VAD_Support Active Member

    Joined:
    Aug 6, 2009
    Messages:
    690
    Likes Received:
    0
    Hi there,

    The VAD doesn't specify the format of the WAV file, it uses the default settings by using the method SpeechSynthesizer.SetOutputToWaveFile from the Microsoft Speech API. It's very possible that the default format is 8khz, so the sample rate is reduced and you may get lower quality audio. But in any case that is the sample rate used by the PSTN, so if you call from outside the company you will hear the audio with that quality, no matter if the original WAV file is better.

    If you want to create the WAV file with more quality, you can update the "ConvertTTS.aspx" file created when you build the project, and change the call to SetOutputToWaveFile to include a second parameter of type SpeechAudioFormatInfo.

    In regards to the voice, the VAD uses the 64 bits versions.

    Kind regards.
     
  5. njfrost

    Joined:
    Aug 13, 2012
    Messages:
    22
    Likes Received:
    0
    Thanks, that's really useful information.
    I'll try that out soon.
     
  6. njfrost

    Joined:
    Aug 13, 2012
    Messages:
    22
    Likes Received:
    0
    Hi again - after a bit of a delay, I'm trying your suggestion!

    In ConvertTTS.aspx, I've edited the deployed source, replacing
    Code:
    speaker.SetOutputToWaveFile(destinationFileName);
    with
    Code:
              speaker.SetOutputToWaveFile(destinationFileName, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
    
    I've also tried "22000" as the sample rate.
    But either value gives me an error. Can you advise where I'm going wrong.
    Code:
    16:45:18.434|171372|(0):Subtag SBjsiEval. entering: 0x0000000000D3EF60, 'encodeURI('ConvertTTS.aspx?'+'PromptCount=1'+'&VoiceName0=IVONA 2 Amy&Volume0=100&Format0=Text&Text0='+'This is a text-to-speech test message')', 0x0000000003EFE8E0
    16:45:18.434|171372|(0):Subtag SBjsiEval. exiting: returned 0, 0x0000000003EFE8E0 (0x0000000001A97450)
    16:45:20.402|171372|(0):Error! Module '3CX.com.OSBinet'. Error ID 219. URLhttp://localhost:5000/ivr/(S(rcbhntb1org3kr2ibmm23pgt))/TestVoice_Debug_46/ConvertTTS.aspx?PromptCount=1&VoiceName0=IVONA%202%20Amy&Volume0=100&Format0=Text&Text0=This%20is%20a%20text-to-speech%20test%20messageMethodGETError-500
    16:45:20.402|171372|(0):Error! Module '3CX.com.OSBinet'. Error ID 204. rc2
    16:45:20.402|171372|(0):DocumentParser::FetchBuffer - could not open URL: ConvertTTS.aspx?PromptCount=1&VoiceName0=IVONA%20Amy&Volume0=100&Format0=Text&Text0=This                  -2s    0x0.000000p-1022text-to-speechtestmessage
    16:45:20.402|171372|(0):DocumentParser::FetchDocument - exiting with error result 2
    16:45:20.402|171372|(0):Error! Module '3CX.com.vxi'. Error ID 203. uriConvertTTS.aspx?PromptCount=1&VoiceName0=IVONA%202%20Amy&Volume0=100&Format0=Text&Text0=This%20is%20a%20text-to-speech%20test%20message
    
    This is the successful call using the originally deployed ConvertTTS:
    Code:
    16:42:01.915|139784|(0):Subtag SBjsiEval. entering: 0x0000000000D3EF70, 'encodeURI('ConvertTTS.aspx?'+'PromptCount=1'+'&VoiceName0=IVONA 2 Amy&Volume0=100&Format0=Text&Text0='+'This is a text-to-speech test message')', 0x00000000039CE6F0
    16:42:01.915|139784|(0):Subtag SBjsiEval. exiting: returned 0, 0x00000000039CE6F0 (0x0000000001A97510)
    
     
  7. VAD_Support

    VAD_Support Active Member

    Joined:
    Aug 6, 2009
    Messages:
    690
    Likes Received:
    0
    Hi there,

    In order to get the cause of the error, you need to check the log file "Errors_ConvertTTS.log" in the project folder (%ProgramData%\3CX\Data\Http\Interface\ivr\ProjectName_BuildNumber).

    Maybe you need to add some imports at the top of the document, for the new classes and enums you're using, below:
    <%@ Import Namespace="System.Speech.Synthesis" %>

    For example, the class SpeechAudioFormatInfo and the enums AudioBitsPerSample and AudioChannel are in the namespace System.Speech.AudioFormat which is not imported.

    Or you can use the full class name including the namespece instead:
    Code:
    speaker.SetOutputToWaveFile(destinationFileName, new System.Speech.AudioFormat.SpeechAudioFormatInfo(8000, System.Speech.AudioFormat.AudioBitsPerSample.Sixteen, System.Speech.AudioFormat.AudioChannel.Mono));
    Kind regards.
     
  8. njfrost

    Joined:
    Aug 13, 2012
    Messages:
    22
    Likes Received:
    0
    Thanks, should have thought of that!
    Code:
    speaker.SetOutputToWaveFile(destinationFileName, 
    new System.Speech.AudioFormat.SpeechAudioFormatInfo(8000, System.Speech.AudioFormat.AudioBitsPerSample.Sixteen,
    System.Speech.AudioFormat.AudioChannel.Mono));
    
    The results I got were that the voice sounds best at 8000.
    And although the Ivona voice is stated as 22KHz, when set to 22000 (or anything over 8000) the sibilance increased noticeably.

    The TTS.wav files produced at 22KHz are noticeably higher quality when played direct from the PC over speakers, but the best quality over the 3CX IVR is achieved with the 8KHz samples per second.

    (When using the default parameter to SetOutputToWaveFile, the output appears to be 22KHz.)
     
  9. VAD_Support

    VAD_Support Active Member

    Joined:
    Aug 6, 2009
    Messages:
    690
    Likes Received:
    0
    Thanks for the feedback, we'll include these changes into the VAD for future versions.

    Kind regards.
     
Thread Status:
Not open for further replies.