Local ASR model

Kyutai STT 2.6B

Production-grade streaming ASR from Kyutai (makers of Moshi). Delay-streaming transformer with 500ms latency, word-level timestamps, speaker diarization. Top of Open ASR Leaderboard for real-time French + English.

Apple Silicon ready speech-to-text transcription 2 languages CC-BY-4.0
Quality
9.4/10
Speed
9.5/10
Model size
2.7 GB
Voices
N/A (ASR: outputs text + timestamps)

Can Kyutai STT 2.6B run locally?

Kyutai STT 2.6B can run locally for offline speech-to-text. Start with pip install moshi.

CC-BY-4.0 license. Review upstream restrictions before commercial use.

streamingrealtimelow-latencymultilingual

Audio profile

Quality
9.4
Speed
9.5
Local
9.4

Best fit

Kyutai STT 2.6B is best for offline transcription, speech indexing and local voice pipelines.

Hardware: gpucpuapple

Model details

Type
Local ASR model
Family
kyutai
Latency
ultra-low
Formats
pytorchsafetensorsmlx
Languages
en, fr
Context
Delay-streaming transformer, 500ms latency

Install locally

01
Check runtimeConfirm the backend supports pytorch, safetensors, mlx on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install moshi

Good for

  • speech-to-text transcription
  • Apple Silicon ready local workflows
  • streaming, realtime, low-latency

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw