Translation - Audio Streaming

Stenomatic gRPC API

Description

Stenomatic uses a gRPC streaming API (https://grpc.io/) defined in the attached proto file (stenomatic.proto). It communicates via a data interchange format called "Protocol Buffers" (protobuf) for efficient binary data streaming. Binary data is sent as raw bytes instead of Base64 encoded strings. Protobuf is a language-neutral, platform-neutral, extensible mechanism for serializing structured data.

Endpoint

The production endpoint is api.stenomatic.com on port 443. At the end of the document, we've attached a sample golang client implementation, which connects to the server and sends an audio file from disk for recognition and translation.

Automatic .proto file Documentation is available here.

Languages

For a full up-to-date list, please visit Supported languages

API calls with unsupported language codes will be rejected by the server.

Audio format

We support two format, uncompressed 16-bit signed little-endian (Linear PCM) mono audio with 16 kHz sampling rate and OGG OPUS encoded audio with 16kHz sampling rate. Audio must be sent in near real-time rate and ~100ms chunks. In case of PCM encoded audio, the RIFF header must NOT be sent, just send the raw audio directly.

Authentication

All API calls must be authenticated with an API key in the gRPC request's metadata key "x-mint-api-key" with string value "[API key]". If it is not provided then the API call will be rejected.

The connection is secured with an SSL certificate and OS's default certificates should accept it. We do not use a self-signed certificate. Both the SSL and API key authentications are shown in the attached samples.

Voices support

We support 8kHz, 16kHz and 24kHz audio sampling rates for TTS voices. Some voices though are not available in 8kHz and we do a fallback to 16kHz sampling rate instead of returning no TTS audio. Without a voice preference parameter, the voices default to male. If a male voice is not available, the default voice is female.

Prerequisites:

1) build Protobuf from source https://github.com/google/protobuf/tree/master/src or use Protobuf binary from source https://github.com/protocolbuffers/protobuf/releases/tag/v21.12

2) then build gRPC from source https://github.com/grpc/grpc/

It's recommended to checkout their latest tag and not master.

Be careful with installation via package managers (e.g. apt-get, brew...), they can install incompatible versions which will lead to crashes and/or silent connection failures.

The last functional combination tested was v3.21.12 for Protobuf and v1.52.3 for gRPC

Request headers:

We do support several optional "control" headers that affect the API calls.

x-mint-api-key is the authentication header where you put your API key

x-mint-client-request-id set your own id for the request. Find the request in the logs via this ID later.

x-mint-client-request-group-id will "group several different API calls into one e.g. two sides of a phone call -- every side has its own x-mint-client-request-id but the same x-mint-client-request-group-id.

x-mint-allow-partial-translate only for the VoiceTranslate API call. If set to true then every partial response with transcription will also have a translation of the transcription. Any other value, or when it is missing, should not return "partial translations".

x-mint-send-push-notifications will enable sending of push notification for API calls via our WebSocket server. We do send them via secure WebSocket endpoint wss://api.stenomatic.com/notifications

x-mint-allow-branch-notifications will send push notifications (if enabled with the option above) to the client branch’s specific channel instead of just to client’s channel. Default is false and notifications arrive into the client’s topic only.

Every customer+client+branch+API call combination has its own channel.

The platform supports sending push notifications for the API calls' responses. We send them via our secure WebSocket endpoint (wss://api.stenomatic.com/notifications)

x-mint-debug-record-audio is for debugging purposes only. It will record the incoming audio if true for the SpeechRecognition and VoiceTranslate API calls into a file. This file is saved into a Google Cloud Storage bucket and is automatically deleted after 14 days. Only selected Google Cloud project users can access/download these files.

.proto file

Golang example

Last updated