API Reference
ASR — Speech Recognition
LiveTranscribe Swahili audio into text. Fine-tuned for natural Swahili speech with a 13.5% word error rate — roughly half the error of multilingual baselines.
Endpoint
POST /v1/speech-to-text/
Upload an audio file as multipart/form-data and receive a Swahili transcript. Accepts WAV, MP3, WebM, FLAC, and OGG up to 25 MB.
Authentication
Pass your API key in the xi-api-key header. See Authentication.
Form fields
| Field | Type | Required | Description |
|---|---|---|---|
audio | file | Yes | Audio file. Must have an audio/* MIME type. Maximum 25 MB. |
language | string | No | ISO 639-1 language code. Defaults to sw. Other supported values: en, fr, de, es, pt, ar, zh, ja, ko, ru. |
Response
json
{
"text": "Habari za asubuhi. Karibu sana.",
"language": "sw",
"duration_seconds": 2.41
}Examples
bash
curl -X POST https://sauti.finiflowlabs.com/v1/speech-to-text/ \
-H "xi-api-key: YOUR_KEY" \
-F "audio=@sample.wav;type=audio/wav" \
-F "language=sw"python
import requests
with open("sample.wav", "rb") as f:
response = requests.post(
"https://sauti.finiflowlabs.com/v1/speech-to-text/",
headers={"xi-api-key": "YOUR_KEY"},
files={"audio": ("sample.wav", f, "audio/wav")},
data={"language": "sw"},
)
response.raise_for_status()
print(response.json()["text"])Model details
- Model:
Finiflowlabs/sauti-asr-v1— published on HuggingFace. - Test WER: 13.52% · Test CER: 3.85%.
- Trained on ~89 hours of Swahili speech, evaluated on the FLEURS
sw_ketest split. - Sample rate: any (resampled internally to 16 kHz mono).
Limits & errors
413— audio file exceeds 25 MB.422— invalid audio format or unsupported language.429— rate limit exceeded. See Rate Limits.