Local TTS model
CosyVoice 2
Industrial-grade multilingual TTS with streaming, voice cloning and emotion control. Exceptional Chinese + English quality. Used in production at Alibaba scale.
Apple Silicon ready
text-to-speech generation
8 languages
Apache 2.0
Quality
9.3/10
Speed
8.8/10
Model size
2.4 GB
Voices
Zero-shot + cross-lingual cloning
Can CosyVoice 2 run locally?
CosyVoice 2 can generate speech locally for private voice workflows. Start with pip install cosyvoice.
Apache 2.0 license. Still verify upstream usage notes before shipping.
pip install cosyvoice
Upstream source
streamingrealtimecloningemotionmultilingual
Audio profile
Best fit
CosyVoice 2 is best for local voice cloning and expressive speech generation.
Hardware: gpuapple
Model details
Type
Local TTS model
Family
cosyvoice
Latency
ultra-low
Formats
pytorchonnx
Languages
en, zh, ja, ko, yue, fr, de, es
Context
Instruct mode with natural language
Install locally
01
Check runtimeConfirm the backend supports pytorch, onnx on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install cosyvoice
Good for
- text-to-speech generation
- Apple Silicon ready local workflows
- streaming, realtime, cloning
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
OpenBMB
VoxCPM2
Local TTS model · Q 9.4 · Speed 8.3
Speech Research (SWivid)
F5-TTS v1.1
Local TTS model · Q 9.5 · Speed 9.2
Alibaba Cloud (Qwen Team)
Qwen3 TTS
Local TTS model · Q 9.5 · Speed 8.5
Zyphra
Zonos v0.1
Local TTS model · Q 9.5 · Speed 8.5
OpenMOSS / MOSI.AI
MOSS-TTS-Nano
Local TTS model · Q 8.5 · Speed 9.7
Fish Audio
Fish Speech
Local TTS model · Q 9 · Speed 8.5
Boson AI
Higgs Audio v2
Local TTS model · Q 9.7 · Speed 7
StepFun
Step-Audio 2 Mini
Local TTS model · Q 9.3 · Speed 7.5