Skip to content

Transcribe

Operations to transcribe (speech-to-text) audio and audiovisual media in a range of different file formats (any format supporting a transcode in FFMPEG).

GET /transcribe

Create a transcript from audio or audiovisual media. In general, for audiovisual content, the audio is stripped and used for the transcription (WAVE or MP3 format).

Parameters

  • async (optional) - Enables or disables asynchronous mode. When enabled, this will return a job identifier immediately while the request is being processed - it is the clients responsibility to check back with the jobs endpoint at a later time to see if the request has been completed, and to collect the results. When disabled, the request will block until completed, and the results ready to return to the client.
  • request (required) - JSON containing the details of the request, with the following attributes:
  • customVocabulary (required) - Custom vocabulary to be used by the vendor during transcription. This may be supported depending on the vendor adapter being used. This should be a list of words separated by comma.
  • description (optional) - Description of the job, for auditing purposes.
  • diarization (optional) - Enables or disables diarization, or speaker recognition. When true, this will return identifiers against the spoken words, to be used by the client to assign a speaker to different phrases, depending on the vendor adapter being used. When false, no phrases will have any speaker information attached to them.
  • nSpeakers (optional) - Estimated number of speakers for diarization, if diarization has been enabled. If negative or unspecified, the vendor will be instructed to guess the number of speakers.
  • file (required) - The content to be transcribed.
  • language (required) - The language of the content to be transcribed. Note that mixed-language content is not yet supported, neither automatic language detection.
  • metric (optional) - If defined, the metric used when automatically proposing a vendor:
    ACCURACY - the vendor with the highest accuracy metrics is proposed (e.g. best quality)
    SPEED - highest speed is proposed (e.g. quicker transcription)
    COST - lowest cost is proposed (e.g. least expensive)
  • name (required) - The name of the transcription request, for auditing purposes.
  • provider (required) - The name of a configuration vendor dapter to be used for the transcription, or AUTO to get an automatic proposal of the best vendor for the specified metric.

Response

Below shows a sample response:

{
  "id": "60ec2fe408423a4878aa807f",
  "ownerId": "c09d9c91-1d2a-4818-9713-b99b1d179f74",
  "provider": "SPEECHMATICS",
  "autoProvider": false,
  "service": "TRANSCRIBE",
  "status": "DONE",
  "created": "2021-07-12T12:04:52.547+0000",
  "started": "2021-07-12T12:04:52.547+0000",
  "finished": "2021-07-12T12:05:40.204+0000",
  "estimatedProgress": 100,
  "url": "[masked]",
  "language": "en",
  "diarization": false,
  "transcript": {
    "diarizationInfo": null,
    "text": "Mexico's new president is Andres Manuel Lopez Obrador, known simply as I believe he inherits a country with extreme levels of violent crime, the murder rate has more than tripled since 2006. It's driven in large part by powerful drug cartels that control the production or distribution of cocaine, heroin and marijuana through extreme violence and fear. This is despite billions being spent on public security and antinarcotics efforts by successive Mexican governments, the destination for most of Mexico's illegal drug exports as the United States. Which is why the U.S. has supported Mexico's war on drugs with financial and military support, but also by helping to take on the drug lords themselves. An example of this is 2500 miles away from Mexico in a New York court where one of the most notorious bosses is on trial. His name, Joaquin Guzman Loera, known simply as El Chapo. And he was extradited to the U.S. after a long legal battle. So with the arrest and trial of Mexico's premiere kingpin, are there signs that the power of the cartels is finally being contained? Possibly over the past decade, a string of the leading drug lords in Mexico have been either killed or put in jail. This policy of targeting drug busts has proved effective in Colombia with the capture of the notorious drug lord Pablo Escobar, whose cartel at its most powerful in the 1980s and early 90s, supplied most of the cocaine reaching the United States. But in Mexico, taking out the drug barons has made little difference. Last year was the most violent on record, with over 32000 murders, and this hasn't gone unnoticed. U.S. funding of the war on drugs in Mexico has fallen sharply from its peak in 2009, but it still provides tens of millions of dollars in training to Mexican forces as well as narcotics control and law enforcement programs. But despite all these efforts and the capture or killing of the drug lords, the war is not being won. In fact, for the years we have data figures show that heroin, opium and cannabis production is up as President Obrador assumes office. He's promised bold new initiatives, but can he achieve what none of his predecessors have been able to do to reduce drug production and the rate of killing in Mexico?",
    "timedTexts": [
      {
        "type": "text",
        "text": "Mexico's",
        "start": 0.06,
        "end": 0.63,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "new",
        "start": 0.63,
        "end": 0.78,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "president",
        "start": 0.78,
        "end": 1.41,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "is",
        "start": 1.41,
        "end": 1.59,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "Andres",
        "start": 1.59,
        "end": 2.16,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "Manuel",
        "start": 2.16,
        "end": 2.61,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "Lopez",
        "start": 2.61,
        "end": 3.03,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "Obrador",
        "start": 3.03,
        "end": 3.63,
        "speakerID": null
      },
      {
        "type": "punctuation",
        "text": ",",
        "start": 3.63,
        "end": 3.63,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "known",
        "start": 3.93,
        "end": 4.23,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "simply",
        "start": 4.23,
        "end": 4.74,
        "speakerID": null
      },
      {
        "type": "text",
        "text": "as",
        "start": 4.74,
        "end": 4.98,
        "speakerID": null
      },
      ...
    ]
  },
  "sourceMetaData": {
    "fileSize": 2706213,
    "mimeType": "audio/mpeg",
    "mediaDuration": 169
  }
}      

GET /transcribe-from-url

A variant of /transcribe that can be used when the media content already exists at a public URL and can be retrieved automatically.

Parameters

  • async (optional) - Enables or disables asynchronous mode. When enabled, this will return a job identifier immediately while the request is being processed - it is the clients responsibility to check back with the jobs endpoint at a later time to see if the request has been completed, and to collect the results. When disabled, the request will block until completed, and the results ready to return to the client.
  • request (required) - JSON containing the details of the request, with the following attributes:
Attribute (* = required) Description
customVocabulary Custom vocabulary to be used by the vendor during transcription. This may be supported depending on the vendor adapter being used. This should be a list of words separated by comma.
description Description of the job, for auditing purposes.
diarization Enables or disables diarization, or speaker recognition. When true, this will return identifiers against the spoken words, to be used by the client to assign a speaker to different phrases, depending on the vendor adapter being used. When false, no phrases will have any speaker information attached to them.
nSpeakers Estimated number of speakers for diarization, if diarization has been enabled. If negative or unspecified, the vendor will be instructed to guess the number of speakers.
url * An accessible URL to the content to be transcribed
language * The language of the content to be transcribed. Note that mixed-language content is not yet supported, neither automatic language detection.
metric If defined, the metric used when automatically proposing a vendor:
ACCURACY - the vendor with the highest accuracy metrics is proposed (e.g. best quality)
SPEED - highest speed is proposed (e.g. quicker transcription)
COST - lowest cost is proposed (e.g. least expensive)
name * The name of the transcription request, for auditing purposes.
provider * The name of a configuration vendor adapter to be used for the transcription, or AUTO to get an automatic proposal of the best vendor for the specified metric.

Response

The response format will be the same as in /transcribe

GET /transcribe/providers

Returns a list of all supported vendor adapters that can respond to a transcription request.

Parameters

None

Response

Below is a sample response:

[
  "AZURE",
  "SPEECHMATICS",
  "AWS",
  "GOOGLE"
]

GET /transcribe/languages

Returns a list of all supported languages for a single, or all vendor adapters.

Parameters

  • provider (optional) - Return the supported languages from a single named provider

Response

A dictionary of vendor adapter names with a list of all supported languages. Below is a sample response:

{
  "AZURE": [
    "ar-AE",
    "ar-BH",
    "ar-DZ",
    "ar-EG",
    "ar-IL",
    "ar-IQ",
    "ar-JO",
    "ar-KW",
    "ar-LB",
    ...
  ],
  "SPEECHMATICS": [
    "en",
    "de",
    "es",
    "fr",
    ...

Note that the language codes are those as reported by the vendor adapter. The Core API will seek to achieve a "best-match" for language codes that do not exactly match those proposed by the vendor, when making requests.

GET /transcribe/stream/languages

Returns a list of all supported languages usable by the streaming API, for a single or all vendor adapters.

Parameters

  • provider (optional) - Return the supported languages from a single named provider

Response

Below is a sample response:

{
  "AZURE": [
    {
      "language": "ar-AE",
      "supportedAudioFormats": [
        {
          "encoding": "PCM",
          "sampleRates": [
            16000
          ],
          "sampleSizesInBits": [
            16
          ],
          "channels": [
            1
          ],
          "bigEndian": false,
          "signed": true
        }
      ]
    },
    ...

Note that for the streaming API, it is currently the responsibility of the client to match the proposed format signalled by the vendor adapter.