Local TTS model

VibeVoice Realtime 0.5B

Open-source real-time streaming TTS model focused on low first-token latency (~300ms) and robust long-form generation.

Apple Silicon ready text-to-speech generation 10 languages MIT
Quality
9.1/10
Speed
9.2/10
Model size
1.1 GB
Voices
Single-speaker realtime voice

Can VibeVoice Realtime 0.5B run locally?

VibeVoice Realtime 0.5B can generate speech locally for private voice workflows. Start with pip install vibevoice && python demo/realtime_inference.py.

MIT license. Still verify upstream usage notes before shipping.

streamingrealtimelow-latency

Audio profile

Quality
9.1
Speed
9.2
Local
9.1

Best fit

VibeVoice Realtime 0.5B is best for fast on-device voice responses and local assistants.

Hardware: gpuapple

Model details

Type
Local TTS model
Family
vibevoice
Latency
ultra-low
Formats
pytorchsafetensors
Languages
en, de, fr, it, ja, ko, nl, pl, pt, es
Context
Research-first release, responsible-use constraints

Install locally

01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install vibevoice && python demo/realtime_inference.py

Good for

  • text-to-speech generation
  • Apple Silicon ready local workflows
  • streaming, realtime, low-latency

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw