Voxygen Cloud - text to speech

Voxygen Cloud is a service that transforms text into human-like expressive speech and generates high-quality audio messages in a variety of languages and expressiveness.

With its easy-to-use API, Voxygen Cloud can help you enrich your voice interactions, bring colour and personality to your contents and let you connect with customers like never before.

Being extremely easy to integrate with any existing solutions and products, Voxygen Cloud allows you to reinvent customer experience and create new services. You can also create your own unique and personalised voice, a stand-out component of your brand identity.

TTS API description

Voxygen Cloud API is a "REST-like" API. Any client application can send text to be vocalised through an HTTP request containing all of the necessary information and optional parameters (voice, audio format, speaking rate, pitch tuning, …). The service responds immediately with the corresponding speech audio data.

A main URL specifies the network address of the API service.

A user account is required to access Voxygen Cloud service. The user account is defined by a login and a password. Client application must set the login value in each request to the service. Password must never be sent to Voxygen Cloud. Client application uses the password to compute an HMAC and sets this HMAC value in each request.

Input text, ssml and pls

Voxygen Cloud complies with W3C’s recommendations Speech Synthesis Mark-up Language (SSML 1.0 and 1.1) and Pronunciation Lexicon Specification (PLS 1.0)

The API accepts input as raw text or SSML (UTF-8 encoded) which helps you fully control several aspects of the speech such as pauses, specific pronunciations, acronyms, numbers, dates, etc.

You can also adjust the rate, the pitch or the volume of the speech.

Additional extension SSML tags can also be added within the text in order to further customize the output of the audio messages, such as background music mix, audio fade controls or synchronisation.

Flexibility of audio formats

Voxygen Cloud allows you the choice of several audio formats such as .RAW, .WAV, .AU, .MP3 or .OGG.

Formats Coding
.RAW, .WAV, .AU 16 bits, PCM, G.711 (A-law, μ-law)
.MP3 Bitrate 16,31, 64, 96, 128 or 160. Quality from 0 to 9
.OGG Quality from 0.0 to 1.0

For all formats, the speech signal output can be sampled at any frequency from 6kHz to 48kHz.The speech signal can be mixed with external audio files.

Supported languages in speech synthesis

The following languages are supported by Voxygen Cloud:

Languages Variant
French France, Belgium, Switzerland,
Senegal, Ivory Coast, Cameroon, Niger
American accent
English United Kingdom, United States
Dutch Netherlands, Belgium Flemish
Portuguese Brazil, Portugal
Arabic MSA, Morocco

Voxygen Studio (ssml)

Voxygen Studio is the perfect companion to Voxygen Cloud. It provides an easy-to-use graphical user interface that allows you to prepare the text, adapt silences and pauses, modify specific word’ pronunciation and introduce other SSML tags, in order to create seamlessly high-quality audio messages in the Voxygen Cloud, with Voxygen’s expressive voices.

Its easy-to-use graphical user interface, accessible through a web browser, gives you access to many editing and audio optimization features in order to take full advantage of Voxygen Cloud service.

A user-friendly Cloud TTS interface


Main features

Once the voice and the language are chosen, you can start editing the text of your message and:

  • listen to the generated audio for the full message or part of the message
  • change the voice used for the full message or a part of the message
  • edit and modify the text
  • add, remove or modify the duration of silences
  • modify phonetics of words
  • add phonetics exceptions in lexicons
  • tune acoustic selection
  • modify voice rate
  • add a musical background
  • export the result of the tuning in a SSML file
  • export the generated audio in one of the many available formats
  • adjust final audio volume
  • listen to the final audio result before exporting

Your personalized voice

Fast time-to-market

Over the years, Voxygen has developed unique know-how and expertise to deliver personalized and Brand Voice of the highest quality, tailored according to customers’ needs. Voxygen’s Expressive Voice creation process has been optimised so that you can launch your Brand Voice in a smooth and timely manner.

Voice enrichment

Once developed, Voxygen’s Expressive Voices can be gradually enriched by incorporating domain-specific vocabulary, paralinguistic features and additional expressivity through state-of-the-art features (Smart Lexicons, Domain-Specific Cor- pus, Multilingual Voices). Your investment is always protected.

Full control of your voice

Not only is your Brand Voice created to reflect your identity and enriched to meet your needs, but Voxygen’s technology and tools also give you full control to fine-tune your messages, through high-level standard interfaces (SSML, PLS).

Read more on SSML compliance.