Voicing
Functions to synthesis spoken audio content from text
GET /voicing
Request that a voice file(s) be created from text. Note that this will create one voice file per sentence requested.
Parameters
- async (optional) - Enables or disables asynchronous mode. When enabled, this will return a job identifier immediately while the request is being processed - it is the clients responsibility to check back with the
jobs
endpoint at a later time to see if the request has been completed, and to collect the results. When disabled, the request will block until completed, and the results ready to return to the client. - request (required) - JSON containing the voicing request with the following parameters:
Attribute (* = required) | Description |
---|---|
metric | If defined, the metric used when automatically proposing a vendor:ACCURACY - the vendor with the highest accuracy metrics is proposed (e.g. best quality)SPEED - highest speed is proposed (e.g. quicker)COST - lowest cost is proposed (e.g. least expensive) |
provider * | The name of a configuration vendor adapter to be used for the voicing, or AUTO to get an automatic proposal of the best vendor for the specified metric . |
mediaFormat * | Format of the generated audio files, one of: FLAC WAV MP3 TXT |
texts * | Array of sentences to create individual audio files |
timings | Array of timing statements to control the length of each generated audio sentence. The length of the array must match the length of the texts array, if used. Passing a value of 0 for a sentence will keep its original timing, as given by the audio produced by the vendor. |
voiceId * | Code given by the vendor to be used to generate the voice. This must correspond exactly to that given by the vendor. |
Below is a sample request:
{
"language": "en-IN",
"mediaFormat": "MP3",
"provider": "GOOGLE",
"texts": [
"This is my example text."
],
"timings": [
0
],
"voiceId": "en-IN-Heera"
}
Response
Below is a sample response:
{
"id": "60ec537e1f580b4e33fe2e1d",
"ownerId": "c09d9c91-1d2a-4818-9713-b99b1d179f74",
"provider": "GOOGLE",
"autoProvider": false,
"service": "VOICING",
"status": "DONE",
"created": "2021-07-12T14:36:46.429+0000",
"started": "2021-07-12T14:36:46.429+0000",
"finished": "2021-07-12T14:36:46.836+0000",
"estimatedProgress": 100,
"generatedVoices": [
{
"item1": {
"textOrSSML": "This is my example text.",
"timing": 0,
"voiceId": "en-IN-Heera",
"language": "en-IN"
},
"item2": [masked]
}
],
"textsOrSSMLs": [
"This is my example text."
],
"timings": [
0
],
"mediaFormat": "MP3",
"voiceId": "en-IN-Heera",
"language": "en-IN"
}
For each input text, a JSON object containing item1
and item2
will be produced. The first will contain information about the individual request with the second containing a link to the generated audio file.
GET /voicing/providers
Returns a list of all supported vendor adapters that can respond to a voicing request.
Parameters
None
Response
Below is a sample response:
[
"AZURE",
"AWS",
"GOOGLE",
]
GET /voicing/voices
Returns a list of supported voices and their codes from the specified vendor.
Parameters
- provider (optional) - Return the supported voice codes from a single named provider
Response
Below is a sample response:
{
"AZURE": {
"en-IE": [
"en-IE-ConnorNeural",
"en-IE-EmilyNeural",
"en-IE-Sean"
],
"uk-UA": [
"uk-UA-OstapNeural",
"uk-UA-PolinaNeural"
],
"en-US": [
"en-US-JennyNeural",
"en-US-JennyMultilingualNeural",
...