API Reference
Real-time Translation (WebSocket)
BetaStream microphone audio in one language and receive translated audio in another, in real time. English ↔ Kiswahili today, with a target end-to-end latency under one second.
Endpoint
WS /v1/translate/stream
Bidirectional WebSocket. The client sends raw 16-bit PCM at 16 kHz, mono, little-endian. The server returns translated WAV audio frames plus JSON events for transcripts, translations, and pipeline timings.
Authentication
Connect to the WebSocket without query parameters, then send a config JSON message as the first frame within 10 seconds. The api_key field on that message is your credential. The legacy ?api_key= query parameter is still accepted but deprecated.
json
{
"type": "config",
"api_key": "YOUR_KEY",
"source_lang": "en",
"target_lang": "sw"
}Client → server
- Config (text, first frame): includes
api_key,source_lang,target_lang, optionalsession_id. - Audio (binary): raw PCM16, 16 kHz, mono. Maximum 1 MB per frame. Excess frames may be dropped to preserve freshness.
- Ping (text): keepalive — server replies with
pong. - End stream (text): cleanly terminate the session.
Server → client
json
// Server → client JSON frames
{ "type": "ready", "session_id": "..." }
{ "type": "transcript", "text": "...", "is_partial": true, "source_lang": "en" }
{ "type": "translation_text", "text": "...", "target_lang": "sw" }
{ "type": "timing", "asr_ms": 180, "mt_ms": 90, "tts_ms": 220, "total_ms": 510 }
{ "type": "error", "code": "...", "detail": "..." }
// Server → client also sends binary frames containing translated WAV audio.Limits
- Maximum concurrent WebSocket sessions per API key: 5 (configurable).
- Maximum audio per session: 30 minutes.
- Idle timeout: 5 minutes of no activity closes the session.
- Maximum audio frame size: 1 MB.
Example
python
import asyncio, json, websockets
async def main():
uri = "wss://sauti.finiflowlabs.com/v1/translate/stream"
async with websockets.connect(uri) as ws:
await ws.send(json.dumps({
"type": "config",
"api_key": "YOUR_KEY",
"source_lang": "en",
"target_lang": "sw",
}))
# Wait for the ready frame, then start streaming PCM16 16kHz mono.
async for message in ws:
if isinstance(message, bytes):
# translated WAV audio chunk
...
else:
event = json.loads(message)
print(event)
asyncio.run(main())Try the live experience in the Translate playground.