Local TTS model
IndexTTS 2
Bilibili's viral open TTS - exceptional zero-shot cloning and emotion transfer. Separately controls voice timbre and emotional style from two different reference clips. Top quality on Chinese + English.
Apple Silicon ready
text-to-speech generation
2 languages
Apache 2.0
Quality
9.4/10
Speed
8/10
Model size
2.4 GB
Voices
Zero-shot + separate emotion reference
Can IndexTTS 2 run locally?
IndexTTS 2 can generate speech locally for private voice workflows. Start with pip install indextts.
Apache 2.0 license. Still verify upstream usage notes before shipping.
pip install indextts
Upstream source
cloningemotionstreamingcontrollable
Audio profile
Best fit
IndexTTS 2 is best for local voice cloning and expressive speech generation.
Hardware: gpuapple
Model details
Type
Local TTS model
Family
indextts
Latency
low
Formats
pytorchsafetensors
Languages
en, zh
Context
Dual reference: voice timbre + emotion
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install indextts
Good for
- text-to-speech generation
- Apple Silicon ready local workflows
- cloning, emotion, streaming
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Zyphra
Zonos v0.1
Local TTS model · Q 9.5 · Speed 8.5
OpenBMB
VoxCPM2
Local TTS model · Q 9.4 · Speed 8.3
Alibaba FunAudioLLM
CosyVoice 2
Local TTS model · Q 9.3 · Speed 8.8
Resemble AI
Chatterbox TTS
Local TTS model · Q 9.4 · Speed 8
Canopy Labs
Orpheus TTS
Local TTS model · Q 9.6 · Speed 7.5
Boson AI
Higgs Audio v2
Local TTS model · Q 9.7 · Speed 7
Hume AI
OCTAVE 2
Local TTS model · Q 9.4 · Speed 7.5
StepFun
Step-Audio 2 Mini
Local TTS model · Q 9.3 · Speed 7.5