Local ASR model

VibeVoice ASR

Open-source multilingual ASR model (speech-to-text), supporting long-form transcription, timestamps, diarization and hotwords.

GPU recommended speech-to-text transcription 50 languages MIT
Quality
9.3/10
Speed
7.5/10
Model size
14 GB
Voices
N/A (ASR: outputs text)

Can VibeVoice ASR run locally?

VibeVoice ASR can run locally for offline speech-to-text. Start with pip install transformers accelerate.

MIT license. Still verify upstream usage notes before shipping.

streamingrealtimemultilingualdialogue

Audio profile

Quality
9.3
Speed
7.5
Local
8.0

Best fit

VibeVoice ASR is best for offline transcription, speech indexing and local voice pipelines.

Hardware: gpuapple

Model details

Type
Local ASR model
Family
vibevoice
Latency
medium
Formats
pytorchsafetensors
Languages
en, zh, es, pt, de, ja, ko, fr, ru, id, sv, it, he, nl, pl, tr, th, ar, hi, fi, el, ro, vi, uk
Context
60-minute single-pass ASR with diarization + timestamps

Install locally

01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install transformers accelerate

Good for

  • speech-to-text transcription
  • GPU recommended local workflows
  • streaming, realtime, multilingual

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw