Local TTS model

LLaSA 3B

LLaMA-based TTS with pure next-token speech generation - no separate decoder. Scales with compute: the 3B variant matches specialised TTS SOTA on zero-shot cloning. Trained on 250K hours of Chinese + English speech.

GPU recommended text-to-speech generation 2 languages CC-BY-NC 4.0
Quality
9.2/10
Speed
7/10
Model size
6.2 GB
Voices
Zero-shot cloning from any reference

Can LLaSA 3B run locally?

LLaSA 3B can generate speech locally for private voice workflows. Start with pip install llasa.

CC-BY-NC 4.0 license. Review upstream restrictions before commercial use.

cloningstreamingrealtime

Audio profile

Quality
9.2
Speed
7
Local
8.0

Best fit

LLaSA 3B is best for local voice cloning and expressive speech generation.

Hardware: gpuapple

Model details

Type
Local TTS model
Family
llasa
Latency
low
Formats
pytorchsafetensors
Languages
en, zh
Context
LLaMA backbone + XCodec2 audio tokens

Install locally

01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install llasa

Good for

  • text-to-speech generation
  • GPU recommended local workflows
  • cloning, streaming, realtime

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw