Local ASR model
Canary 1B v2
NVIDIA multilingual ASR + speech translation in a single model. 25 European languages, bidirectional EN↔XX translation. Tops Open ASR Leaderboard multilingual category. Word-level timestamps, punctuation & capitalization.
Apple Silicon ready
speech-to-text transcription
25 languages
CC-BY-4.0
Quality
9.3/10
Speed
9/10
Model size
2 GB
Voices
N/A (ASR + translation: outputs text)
Can Canary 1B v2 run locally?
Canary 1B v2 can run locally for offline speech-to-text. Start with pip install nemo_toolkit[asr].
CC-BY-4.0 license. Review upstream restrictions before commercial use.
pip install nemo_toolkit[asr]
Upstream source
streamingmultilingualrealtime
Audio profile
Best fit
Canary 1B v2 is best for offline transcription, speech indexing and local voice pipelines.
Hardware: gpuapple
Model details
Type
Local ASR model
Family
canary
Latency
low
Formats
nemo
Languages
en, de, es, fr, it, pt, pl, nl, sv, fi, da, cs, hu, ro, bg, hr, sk, sl, et, lv, lt, el, mt, ga
Context
ASR + speech translation, 25 languages, 1B params
Install locally
01
Check runtimeConfirm the backend supports nemo on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install nemo_toolkit[asr]
Good for
- speech-to-text transcription
- Apple Silicon ready local workflows
- streaming, multilingual, realtime
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Kyutai
Kyutai STT 2.6B
Local ASR model · Q 9.4 · Speed 9.5
Alibaba Cloud (Qwen Team)
Qwen3-ASR
Local ASR model · Q 9.5 · Speed 9
OpenAI
Whisper v3 Turbo
Local ASR model · Q 9.1 · Speed 9.5
IBM Granite Team
Granite Speech 4.1 2B
Local ASR model · Q 9.2 · Speed 8
Microsoft Research
VibeVoice ASR
Local ASR model · Q 9.3 · Speed 7.5
Cohere
Cohere Transcribe 03-2026
Local ASR model · Q 9 · Speed 8
NVIDIA
Parakeet TDT 0.6B v2
Local ASR model · Q 9.4 · Speed 10
hexgrad
Kokoro TTS
Local TTS model · Q 9.2 · Speed 9.8