Local TTS model
VibeVoice Realtime 0.5B
Open-source real-time streaming TTS model focused on low first-token latency (~300ms) and robust long-form generation.
Apple Silicon ready
text-to-speech generation
10 languages
MIT
Quality
9.1/10
Speed
9.2/10
Model size
1.1 GB
Voices
Single-speaker realtime voice
Can VibeVoice Realtime 0.5B run locally?
VibeVoice Realtime 0.5B can generate speech locally for private voice workflows. Start with pip install vibevoice && python demo/realtime_inference.py.
MIT license. Still verify upstream usage notes before shipping.
pip install vibevoice && python demo/realtime_inference.py
Upstream source
streamingrealtimelow-latency
Audio profile
Best fit
VibeVoice Realtime 0.5B is best for fast on-device voice responses and local assistants.
Hardware: gpuapple
Model details
Type
Local TTS model
Family
vibevoice
Latency
ultra-low
Formats
pytorchsafetensors
Languages
en, de, fr, it, ja, ko, nl, pl, pt, es
Context
Research-first release, responsible-use constraints
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install vibevoice && python demo/realtime_inference.py
Good for
- text-to-speech generation
- Apple Silicon ready local workflows
- streaming, realtime, low-latency
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Microsoft Research
VibeVoice 1.5B
Local TTS model · Q 9.4 · Speed 6.5
Microsoft Research
VibeVoice ASR
Local ASR model · Q 9.3 · Speed 7.5
hexgrad
Kokoro TTS
Local TTS model · Q 9.2 · Speed 9.8
Kyutai
Moshi
Local TTS model · Q 9 · Speed 9.5
Neuphonic
NeuTTS Air
Local TTS model · Q 9 · Speed 9.5
OpenMOSS / MOSI.AI
MOSS-TTS-Nano
Local TTS model · Q 8.5 · Speed 9.7
Speech Research (SWivid)
F5-TTS v1.1
Local TTS model · Q 9.5 · Speed 9.2
Speech Research
F5-TTS
Local TTS model · Q 9.4 · Speed 9