Local ASR model
Kyutai STT 2.6B
Production-grade streaming ASR from Kyutai (makers of Moshi). Delay-streaming transformer with 500ms latency, word-level timestamps, speaker diarization. Top of Open ASR Leaderboard for real-time French + English.
Apple Silicon ready
speech-to-text transcription
2 languages
CC-BY-4.0
Quality
9.4/10
Speed
9.5/10
Model size
2.7 GB
Voices
N/A (ASR: outputs text + timestamps)
Can Kyutai STT 2.6B run locally?
Kyutai STT 2.6B can run locally for offline speech-to-text. Start with pip install moshi.
CC-BY-4.0 license. Review upstream restrictions before commercial use.
pip install moshi
Upstream source
streamingrealtimelow-latencymultilingual
Audio profile
Best fit
Kyutai STT 2.6B is best for offline transcription, speech indexing and local voice pipelines.
Hardware: gpucpuapple
Model details
Type
Local ASR model
Family
kyutai
Latency
ultra-low
Formats
pytorchsafetensorsmlx
Languages
en, fr
Context
Delay-streaming transformer, 500ms latency
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors, mlx on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install moshi
Good for
- speech-to-text transcription
- Apple Silicon ready local workflows
- streaming, realtime, low-latency
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Alibaba Cloud (Qwen Team)
Qwen3-ASR
Local ASR model · Q 9.5 · Speed 9
OpenAI
Whisper v3 Turbo
Local ASR model · Q 9.1 · Speed 9.5
NVIDIA
Parakeet TDT 0.6B v2
Local ASR model · Q 9.4 · Speed 10
NVIDIA
Canary 1B v2
Local ASR model · Q 9.3 · Speed 9
IBM Granite Team
Granite Speech 4.1 2B
Local ASR model · Q 9.2 · Speed 8
Microsoft Research
VibeVoice ASR
Local ASR model · Q 9.3 · Speed 7.5
Cohere
Cohere Transcribe 03-2026
Local ASR model · Q 9 · Speed 8
hexgrad
Kokoro TTS
Local TTS model · Q 9.2 · Speed 9.8