Local TTS model

VibeVoice 1.5B

Open-source long-form multi-speaker TTS model (up to 90 min, up to 4 speakers). Listed as research-first with responsible-use constraints.

GPU recommended text-to-speech generation 2 languages MIT
Quality
9.4/10
Speed
6.5/10
Model size
5.8 GB
Voices
Up to 4 speakers in long-form dialogue

Can VibeVoice 1.5B run locally?

VibeVoice 1.5B can generate speech locally for private voice workflows. Start with pip install vibevoice && python demo/tts_inference.py.

MIT license. Still verify upstream usage notes before shipping.

streamingdialoguemultilingualemotion

Audio profile

Quality
9.4
Speed
6.5
Local
8.0

Best fit

VibeVoice 1.5B is best for multilingual local speech generation.

Hardware: gpuapple

Model details

Type
Local TTS model
Family
vibevoice
Latency
medium
Formats
pytorchsafetensors
Languages
en, zh
Context
Research-first release, watermark + disclosure recommended

Install locally

01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install vibevoice && python demo/tts_inference.py

Good for

  • text-to-speech generation
  • GPU recommended local workflows
  • streaming, dialogue, multilingual

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw