Skip to content

Voicing

Functions to synthesis spoken audio content from text

GET /voicing

Request that a voice file(s) be created from text. Note that this will create one voice file per sentence requested.

Parameters

  • async (optional) - Enables or disables asynchronous mode. When enabled, this will return a job identifier immediately while the request is being processed - it is the clients responsibility to check back with the jobs endpoint at a later time to see if the request has been completed, and to collect the results. When disabled, the request will block until completed, and the results ready to return to the client.
  • request (required) - JSON containing the voicing request with the following parameters:
Attribute (* = required) Description
metric If defined, the metric used when automatically proposing a vendor:
ACCURACY - the vendor with the highest accuracy metrics is proposed (e.g. best quality)
SPEED - highest speed is proposed (e.g. quicker)
COST - lowest cost is proposed (e.g. least expensive)
provider * The name of a configuration vendor adapter to be used for the voicing, or AUTO to get an automatic proposal of the best vendor for the specified metric.
mediaFormat * Format of the generated audio files, one of:
FLAC
WAV
MP3
TXT
texts * Array of sentences to create individual audio files
timings Array of timing statements to control the length of each generated audio sentence. The length of the array must match the length of the texts array, if used. Passing a value of 0 for a sentence will keep its original timing, as given by the audio produced by the vendor.
voiceId * Code given by the vendor to be used to generate the voice. This must correspond exactly to that given by the vendor.

Below is a sample request:

{
  "language": "en-IN",
  "mediaFormat": "MP3",
  "provider": "GOOGLE",
  "texts": [
    "This is my example text."
  ],
  "timings": [
    0
  ],
  "voiceId": "en-IN-Heera"
}

Response

Below is a sample response:

{
  "id": "60ec537e1f580b4e33fe2e1d",
  "ownerId": "c09d9c91-1d2a-4818-9713-b99b1d179f74",
  "provider": "GOOGLE",
  "autoProvider": false,
  "service": "VOICING",
  "status": "DONE",
  "created": "2021-07-12T14:36:46.429+0000",
  "started": "2021-07-12T14:36:46.429+0000",
  "finished": "2021-07-12T14:36:46.836+0000",
  "estimatedProgress": 100,
  "generatedVoices": [
    {
      "item1": {
        "textOrSSML": "This is my example text.",
        "timing": 0,
        "voiceId": "en-IN-Heera",
        "language": "en-IN"
      },
      "item2": [masked]
    }
  ],
  "textsOrSSMLs": [
    "This is my example text."
  ],
  "timings": [
    0
  ],
  "mediaFormat": "MP3",
  "voiceId": "en-IN-Heera",
  "language": "en-IN"
}

For each input text, a JSON object containing item1 and item2 will be produced. The first will contain information about the individual request with the second containing a link to the generated audio file.

GET /voicing/providers

Returns a list of all supported vendor adapters that can respond to a voicing request.

Parameters

None

Response

Below is a sample response:

[
  "AZURE",
  "AWS",
  "GOOGLE",
]

GET /voicing/voices

Returns a list of supported voices and their codes from the specified vendor.

Parameters

  • provider (optional) - Return the supported voice codes from a single named provider

Response

Below is a sample response:

{
  "AZURE": {
    "en-IE": [
      "en-IE-ConnorNeural",
      "en-IE-EmilyNeural",
      "en-IE-Sean"
    ],
    "uk-UA": [
      "uk-UA-OstapNeural",
      "uk-UA-PolinaNeural"
    ],
    "en-US": [
      "en-US-JennyNeural",
      "en-US-JennyMultilingualNeural",
      ...

jobs