Local TTS model
LLaSA 3B
LLaMA-based TTS with pure next-token speech generation - no separate decoder. Scales with compute: the 3B variant matches specialised TTS SOTA on zero-shot cloning. Trained on 250K hours of Chinese + English speech.
GPU recommended
text-to-speech generation
2 languages
CC-BY-NC 4.0
Quality
9.2/10
Speed
7/10
Model size
6.2 GB
Voices
Zero-shot cloning from any reference
Can LLaSA 3B run locally?
LLaSA 3B can generate speech locally for private voice workflows. Start with pip install llasa.
CC-BY-NC 4.0 license. Review upstream restrictions before commercial use.
pip install llasa
Upstream source
cloningstreamingrealtime
Audio profile
Best fit
LLaSA 3B is best for local voice cloning and expressive speech generation.
Hardware: gpuapple
Model details
Type
Local TTS model
Family
llasa
Latency
low
Formats
pytorchsafetensors
Languages
en, zh
Context
LLaMA backbone + XCodec2 audio tokens
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install llasa
Good for
- text-to-speech generation
- GPU recommended local workflows
- cloning, streaming, realtime
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Speech Research (SWivid)
F5-TTS v1.1
Local TTS model · Q 9.5 · Speed 9.2
Speech Research
F5-TTS
Local TTS model · Q 9.4 · Speed 9
Amphion Team
MaskGCT
Local TTS model · Q 9.4 · Speed 9
Zyphra
Zonos v0.1
Local TTS model · Q 9.5 · Speed 8.5
Neuphonic
NeuTTS Air
Local TTS model · Q 9 · Speed 9.5
Alibaba FunAudioLLM
CosyVoice 2
Local TTS model · Q 9.3 · Speed 8.8
OpenBMB
VoxCPM2
Local TTS model · Q 9.4 · Speed 8.3
OpenMOSS / MOSI.AI
MOSS-TTS-Nano
Local TTS model · Q 8.5 · Speed 9.7