Live Demo

Speech to Text

Upload or record Swahili audio and get an accurate transcript. Powered by our fine-tuned Whisper model with 13.5% WER.

Live

WAV, MP3, WebM, or any audio format. Max 50MB.

ModelFine-tuned Whisper-medium for Swahili — trained on 89 hours of speech data. 13.5% WER on test set. View model card

How it worksAudio is resampled to 16kHz, processed by Whisper's encoder into mel spectrograms, then decoded into Swahili text with beam search.

HostingFirst request may take 30-60s for cold start (model loading), then inference runs at ~0.3x real-time.