Local TTS model

IndexTTS 2

Bilibili's viral open TTS - exceptional zero-shot cloning and emotion transfer. Separately controls voice timbre and emotional style from two different reference clips. Top quality on Chinese + English.

Apple Silicon ready text-to-speech generation 2 languages Apache 2.0
Quality
9.4/10
Speed
8/10
Model size
2.4 GB
Voices
Zero-shot + separate emotion reference

Can IndexTTS 2 run locally?

IndexTTS 2 can generate speech locally for private voice workflows. Start with pip install indextts.

Apache 2.0 license. Still verify upstream usage notes before shipping.

cloningemotionstreamingcontrollable

Audio profile

Quality
9.4
Speed
8
Local
8.8

Best fit

IndexTTS 2 is best for local voice cloning and expressive speech generation.

Hardware: gpuapple

Model details

Type
Local TTS model
Family
indextts
Latency
low
Formats
pytorchsafetensors
Languages
en, zh
Context
Dual reference: voice timbre + emotion

Install locally

01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install indextts

Good for

  • text-to-speech generation
  • Apple Silicon ready local workflows
  • cloning, emotion, streaming

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw