API Reference

ASR — Speech Recognition

Live

Transcribe Swahili audio into text. Fine-tuned for natural Swahili speech with a 13.5% word error rate — roughly half the error of multilingual baselines.

Endpoint

POST /v1/speech-to-text/

Upload an audio file as multipart/form-data and receive a Swahili transcript. Accepts WAV, MP3, WebM, FLAC, and OGG up to 25 MB.

Authentication

Pass your API key in the xi-api-key header. See Authentication.

Form fields

FieldTypeRequiredDescription
audiofileYesAudio file. Must have an audio/* MIME type. Maximum 25 MB.
languagestringNoISO 639-1 language code. Defaults to sw. Other supported values: en, fr, de, es, pt, ar, zh, ja, ko, ru.

Response

json
{
  "text": "Habari za asubuhi. Karibu sana.",
  "language": "sw",
  "duration_seconds": 2.41
}

Examples

bash
curl -X POST https://sauti.finiflowlabs.com/v1/speech-to-text/ \
  -H "xi-api-key: YOUR_KEY" \
  -F "audio=@sample.wav;type=audio/wav" \
  -F "language=sw"
python
import requests

with open("sample.wav", "rb") as f:
    response = requests.post(
        "https://sauti.finiflowlabs.com/v1/speech-to-text/",
        headers={"xi-api-key": "YOUR_KEY"},
        files={"audio": ("sample.wav", f, "audio/wav")},
        data={"language": "sw"},
    )

response.raise_for_status()
print(response.json()["text"])

Model details

  • Model: Finiflowlabs/sauti-asr-v1 — published on HuggingFace.
  • Test WER: 13.52%  ·  Test CER: 3.85%.
  • Trained on ~89 hours of Swahili speech, evaluated on the FLEURS sw_ke test split.
  • Sample rate: any (resampled internally to 16 kHz mono).

Limits & errors

  • 413 — audio file exceeds 25 MB.
  • 422 — invalid audio format or unsupported language.
  • 429 — rate limit exceeded. See Rate Limits.