Local TTS model
VibeVoice 1.5B
Open-source long-form multi-speaker TTS model (up to 90 min, up to 4 speakers). Listed as research-first with responsible-use constraints.
GPU recommended
text-to-speech generation
2 languages
MIT
Quality
9.4/10
Speed
6.5/10
Model size
5.8 GB
Voices
Up to 4 speakers in long-form dialogue
Can VibeVoice 1.5B run locally?
VibeVoice 1.5B can generate speech locally for private voice workflows. Start with pip install vibevoice && python demo/tts_inference.py.
MIT license. Still verify upstream usage notes before shipping.
pip install vibevoice && python demo/tts_inference.py
Upstream source
streamingdialoguemultilingualemotion
Audio profile
Best fit
VibeVoice 1.5B is best for multilingual local speech generation.
Hardware: gpuapple
Model details
Type
Local TTS model
Family
vibevoice
Latency
medium
Formats
pytorchsafetensors
Languages
en, zh
Context
Research-first release, watermark + disclosure recommended
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install vibevoice && python demo/tts_inference.py
Good for
- text-to-speech generation
- GPU recommended local workflows
- streaming, dialogue, multilingual
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Microsoft Research
VibeVoice Realtime 0.5B
Local TTS model · Q 9.1 · Speed 9.2
Microsoft Research
VibeVoice ASR
Local ASR model · Q 9.3 · Speed 7.5
Boson AI
Higgs Audio v2
Local TTS model · Q 9.7 · Speed 7
StepFun
Step-Audio 2 Mini
Local TTS model · Q 9.3 · Speed 7.5
Alibaba Cloud (Qwen Team)
Qwen3 TTS
Local TTS model · Q 9.5 · Speed 8.5
Kyutai
Moshi
Local TTS model · Q 9 · Speed 9.5
Alibaba FunAudioLLM
CosyVoice 2
Local TTS model · Q 9.3 · Speed 8.8
OpenBMB
VoxCPM2
Local TTS model · Q 9.4 · Speed 8.3