Local ASR model
Parakeet TDT 0.6B v2
NVIDIA's SOTA lightweight ASR - 0.6B params, #1 on Open ASR Leaderboard for English. TDT (Token-and-Duration Transducer) decoding makes it 50× faster than Whisper Large v3 on GPU. Real-time streaming with word-level timestamps.
Apple Silicon ready
speech-to-text transcription
1 languages
CC-BY-4.0
Quality
9.4/10
Speed
10/10
Model size
1.1 GB
Voices
N/A (ASR: outputs text + word timestamps)
Can Parakeet TDT 0.6B v2 run locally?
Parakeet TDT 0.6B v2 can run locally for offline speech-to-text. Start with pip install nemo_toolkit[asr].
CC-BY-4.0 license. Review upstream restrictions before commercial use.
pip install nemo_toolkit[asr]
Upstream source
streamingrealtimelow-latency
Audio profile
Best fit
Parakeet TDT 0.6B v2 is best for offline transcription, speech indexing and local voice pipelines.
Hardware: gpucpuapple
Model details
Type
Local ASR model
Family
parakeet
Latency
ultra-low
Formats
nemoonnx
Languages
en
Context
TDT decoder, RNN-T architecture, 0.6B params
Install locally
01
Check runtimeConfirm the backend supports nemo, onnx on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install nemo_toolkit[asr]
Good for
- speech-to-text transcription
- Apple Silicon ready local workflows
- streaming, realtime, low-latency
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Kyutai
Kyutai STT 2.6B
Local ASR model · Q 9.4 · Speed 9.5
Alibaba Cloud (Qwen Team)
Qwen3-ASR
Local ASR model · Q 9.5 · Speed 9
OpenAI
Whisper v3 Turbo
Local ASR model · Q 9.1 · Speed 9.5
NVIDIA
Canary 1B v2
Local ASR model · Q 9.3 · Speed 9
IBM Granite Team
Granite Speech 4.1 2B
Local ASR model · Q 9.2 · Speed 8
Microsoft Research
VibeVoice ASR
Local ASR model · Q 9.3 · Speed 7.5
Cohere
Cohere Transcribe 03-2026
Local ASR model · Q 9 · Speed 8
hexgrad
Kokoro TTS
Local TTS model · Q 9.2 · Speed 9.8