WebSocket(realtime)

Authorization

Clients have to provide an access-token as query parameter to authenticate themselves with the WebSocket Server. You have to use REST API To get an access-token. Please check Obtain an access-token API.

Open a WebSocket connection

To open a WebSocket connection you have to obtain an access-token and provide it as query parameter in the url. By default the API only accepts secured connections via wss.

The Access Token from REST API is for one-time use. It should be issued on every new connection.

wss://zeroth.goodatlas.com:2087/client/ws/speech?access-token=<access-token>&language=<language>

Mandatory Parameter

access-token - Your access token.

language Currently we provide two different language models, eng for English and kor for Korean.

Optional Parameter

final-only Set to true if you only want the final result. (default false)

content-type Only needed if your audio source is recorded with a microphone. The most common option are:

  • 16 KHz, Mono: audio/x-raw,+layout=(string)interleaved,+rate=(int)16000,+format=(string)S16LE,+channels=(int)1

  • 44 KHz, Mono:

    audio/x-raw,+layout=(string)interleaved,+rate=(int)44100,+format=(string)S16LE,+channels=(int)1

Handshake status codes

It is recommended to read the http status code of the handshake response and handle possible errors. The following list shows possible status codes of the handshake response

101 - OK

400 - Missing mandatory parameter

401 - Invalid access-token

403 - Free usage exceeded and no credit card available

Sending audio data

Once a connection is established, the client can start sending audio data (e.g. a file or microphone recording) as binary. We are supporting most of the common audio files like .mp3, .flac, .wav, .ogg, .oga, .mp4,...

After all audio data is sent to the server, the client should send a text message with the content EOS through the same connection. This message tells the server that the audio transmission is complete.

Receiving transcription

Transcribed text of the audio will be sent back to the client in real time. The format of the transcription object depends on the parameters the client used to establish the connection.

Partial Result

{
  "transcript": "hello",
  "final": false
}

transcript: The transcribed text.

final: Flag to indicate if the result is final or partial.

Final Result

{
  "transcript": "hello",
  "likelihood": 61.4353,
  "word-alignment": [
    {
      "start": 0.27,
      "length": 0.99,
      "word": "hello",
      "confidence": 0.866439
    }
  ],
  "final": true,
  "segment-start": 0,
  "segment-length": 2.1,
  "total-length": 2.1
}

transcript: The transcribed text.

likelihood: Likelihood of the transcribed text.

word-alignment:

final: Flag to indicate if the results is final or partial.

segment-start: Start time of this segment in seconds.

segment-length: Length of this segment in seconds.

total-length: Length of all segments in seconds.

Closing connection

The client should not manually close the connection, which would be handled as an error on the server side.

The server will automatically close the connection to the client after the last result was transmitted to the client.

Close Status

The server closes the websocket connection to the client with one of the following status codes. Consider to check the status code for error handling on the client side or for reporting any issues.

Last updated