API Reference

Real-time Translation (WebSocket)

Beta

Stream microphone audio in one language and receive translated audio in another, in real time. English ↔ Kiswahili today, with a target end-to-end latency under one second.

Endpoint

WS /v1/translate/stream

Bidirectional WebSocket. The client sends raw 16-bit PCM at 16 kHz, mono, little-endian. The server returns translated WAV audio frames plus JSON events for transcripts, translations, and pipeline timings.

Authentication

Connect to the WebSocket without query parameters, then send a config JSON message as the first frame within 10 seconds. The api_key field on that message is your credential. The legacy ?api_key= query parameter is still accepted but deprecated.

json

{
  "type": "config",
  "api_key": "YOUR_KEY",
  "source_lang": "en",
  "target_lang": "sw"
}

Client → server

Config (text, first frame): includes api_key, source_lang, target_lang, optional session_id.
Audio (binary): raw PCM16, 16 kHz, mono. Maximum 1 MB per frame. Excess frames may be dropped to preserve freshness.
Ping (text): keepalive — server replies with pong.
End stream (text): cleanly terminate the session.

Server → client

json

// Server → client JSON frames

{ "type": "ready", "session_id": "..." }
{ "type": "transcript", "text": "...", "is_partial": true, "source_lang": "en" }
{ "type": "translation_text", "text": "...", "target_lang": "sw" }
{ "type": "timing", "asr_ms": 180, "mt_ms": 90, "tts_ms": 220, "total_ms": 510 }
{ "type": "error", "code": "...", "detail": "..." }

// Server → client also sends binary frames containing translated WAV audio.

Limits

Maximum concurrent WebSocket sessions per API key: 5 (configurable).
Maximum audio per session: 30 minutes.
Idle timeout: 5 minutes of no activity closes the session.
Maximum audio frame size: 1 MB.

Example

python

import asyncio, json, websockets

async def main():
    uri = "wss://sauti.finiflowlabs.com/v1/translate/stream"
    async with websockets.connect(uri) as ws:
        await ws.send(json.dumps({
            "type": "config",
            "api_key": "YOUR_KEY",
            "source_lang": "en",
            "target_lang": "sw",
        }))

        # Wait for the ready frame, then start streaming PCM16 16kHz mono.
        async for message in ws:
            if isinstance(message, bytes):
                # translated WAV audio chunk
                ...
            else:
                event = json.loads(message)
                print(event)

asyncio.run(main())

Try the live experience in the Translate playground.