Local ASR model

VibeVoice ASR

Q: Can VibeVoice ASR run locally?

VibeVoice ASR is listed by LocalClaw as a local ASR option. Hardware fit depends on runtime, model size and backend support.

Open-source multilingual ASR model (speech-to-text), supporting long-form transcription, timestamps, diarization and hotwords.

GPU recommended speech-to-text transcription 50 languages MIT

Compare TTS models Open source page

Quality

9.3/10

Speed

7.5/10

Model size

14 GB

Voices

N/A (ASR: outputs text)

Can VibeVoice ASR run locally?

VibeVoice ASR can run locally for offline speech-to-text. Start with pip install transformers accelerate.

MIT license. Still verify upstream usage notes before shipping.

pip install transformers accelerate Upstream source

streamingrealtimemultilingualdialogue

Audio profile

Quality

9.3

Speed

7.5

Local

8.0

Best fit

VibeVoice ASR is best for offline transcription, speech indexing and local voice pipelines.

Hardware: gpuapple

Model details

Type

Local ASR model

Family

vibevoice

Latency

medium

Formats

pytorchsafetensors

Languages

en, zh, es, pt, de, ja, ko, fr, ru, id, sv, it, he, nl, pl, tr, th, ar, hi, fi, el, ro, vi, uk

Context

60-minute single-pass ASR with diarization + timestamps

Install locally

Check runtimeConfirm the backend supports pytorch, safetensors on your machine.

Install modelUse the upstream command or repository instructions.

Test locallyRun a short private audio prompt before moving into production workflows.

pip install transformers accelerate

Good for

speech-to-text transcription
GPU recommended local workflows
streaming, realtime, multilingual

Watch before shipping

Validate pronunciation, latency and artifacts with your own voice samples.
Review the upstream license and acceptable-use notes.
Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

Microsoft Research VibeVoice 1.5B Local TTS model · Q 9.4 · Speed 6.5 Microsoft Research VibeVoice Realtime 0.5B Local TTS model · Q 9.1 · Speed 9.2 Kyutai Kyutai STT 2.6B Local ASR model · Q 9.4 · Speed 9.5 Alibaba Cloud (Qwen Team) Qwen3-ASR Local ASR model · Q 9.5 · Speed 9 OpenAI Whisper v3 Turbo Local ASR model · Q 9.1 · Speed 9.5 NVIDIA Canary 1B v2 Local ASR model · Q 9.3 · Speed 9 IBM Granite Team Granite Speech 4.1 2B Local ASR model · Q 9.2 · Speed 8 Cohere Cohere Transcribe 03-2026 Local ASR model · Q 9 · Speed 8

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw