Local ASR model
VibeVoice ASR
Open-source multilingual ASR model (speech-to-text), supporting long-form transcription, timestamps, diarization and hotwords.
GPU recommended
speech-to-text transcription
50 languages
MIT
Quality
9.3/10
Speed
7.5/10
Model size
14 GB
Voices
N/A (ASR: outputs text)
Can VibeVoice ASR run locally?
VibeVoice ASR can run locally for offline speech-to-text. Start with pip install transformers accelerate.
MIT license. Still verify upstream usage notes before shipping.
pip install transformers accelerate
Upstream source
streamingrealtimemultilingualdialogue
Audio profile
Best fit
VibeVoice ASR is best for offline transcription, speech indexing and local voice pipelines.
Hardware: gpuapple
Model details
Type
Local ASR model
Family
vibevoice
Latency
medium
Formats
pytorchsafetensors
Languages
en, zh, es, pt, de, ja, ko, fr, ru, id, sv, it, he, nl, pl, tr, th, ar, hi, fi, el, ro, vi, uk
Context
60-minute single-pass ASR with diarization + timestamps
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install transformers accelerate
Good for
- speech-to-text transcription
- GPU recommended local workflows
- streaming, realtime, multilingual
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Microsoft Research
VibeVoice 1.5B
Local TTS model · Q 9.4 · Speed 6.5
Microsoft Research
VibeVoice Realtime 0.5B
Local TTS model · Q 9.1 · Speed 9.2
Kyutai
Kyutai STT 2.6B
Local ASR model · Q 9.4 · Speed 9.5
Alibaba Cloud (Qwen Team)
Qwen3-ASR
Local ASR model · Q 9.5 · Speed 9
OpenAI
Whisper v3 Turbo
Local ASR model · Q 9.1 · Speed 9.5
NVIDIA
Canary 1B v2
Local ASR model · Q 9.3 · Speed 9
IBM Granite Team
Granite Speech 4.1 2B
Local ASR model · Q 9.2 · Speed 8
Cohere
Cohere Transcribe 03-2026
Local ASR model · Q 9 · Speed 8