API Reference

ASR — Speech Recognition

Live

Transcribe Swahili audio into text. Fine-tuned for natural Swahili speech with a 13.5% word error rate — roughly half the error of multilingual baselines.

Endpoint

POST /v1/speech-to-text/

Upload an audio file as multipart/form-data and receive a Swahili transcript. Accepts WAV, MP3, WebM, FLAC, and OGG up to 25 MB.

Authentication

Pass your API key in the xi-api-key header. See Authentication.

Form fields

Field	Type	Required	Description
`audio`	file	Yes	Audio file. Must have an `audio/*` MIME type. Maximum 25 MB.
`language`	string	No	ISO 639-1 language code. Defaults to `sw`. Other supported values: `en`, `fr`, `de`, `es`, `pt`, `ar`, `zh`, `ja`, `ko`, `ru`.

Response

json

{
  "text": "Habari za asubuhi. Karibu sana.",
  "language": "sw",
  "duration_seconds": 2.41
}

Examples

bash

curl -X POST https://sauti.finiflowlabs.com/v1/speech-to-text/ \
  -H "xi-api-key: YOUR_KEY" \
  -F "audio=@sample.wav;type=audio/wav" \
  -F "language=sw"

python

import requests

with open("sample.wav", "rb") as f:
    response = requests.post(
        "https://sauti.finiflowlabs.com/v1/speech-to-text/",
        headers={"xi-api-key": "YOUR_KEY"},
        files={"audio": ("sample.wav", f, "audio/wav")},
        data={"language": "sw"},
    )

response.raise_for_status()
print(response.json()["text"])

Model details

Model: Finiflowlabs/sauti-asr-v1 — published on HuggingFace.
Test WER: 13.52% · Test CER: 3.85%.
Trained on ~89 hours of Swahili speech, evaluated on the FLEURS sw_ke test split.
Sample rate: any (resampled internally to 16 kHz mono).

Limits & errors

413 — audio file exceeds 25 MB.
422 — invalid audio format or unsupported language.
429 — rate limit exceeded. See Rate Limits.