Local TTS model
Step-Audio 2 Mini
Open-source multi-modal speech LLM. Unified understanding + generation in one model - ASR, TTS, voice conversion, speech dialogue. Strong expressive control and paralinguistic features. Available in Mini (8B) and Full variants.
GPU recommended
text-to-speech generation
3 languages
Apache 2.0
Quality
9.3/10
Speed
7.5/10
Model size
4.8 GB
Voices
Multi-speaker + voice conversion
Can Step-Audio 2 Mini run locally?
Step-Audio 2 Mini can generate speech locally for private voice workflows. Start with pip install step-audio.
Apache 2.0 license. Still verify upstream usage notes before shipping.
pip install step-audio
Upstream source
cloningdialogueemotionstreamingmultilingual
Audio profile
Best fit
Step-Audio 2 Mini is best for local voice cloning and expressive speech generation.
Hardware: gpuapple
Model details
Type
Local TTS model
Family
step
Latency
low
Formats
pytorchsafetensors
Languages
en, zh, ja
Context
Unified speech LLM (ASR + TTS + dialogue)
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install step-audio
Good for
- text-to-speech generation
- GPU recommended local workflows
- cloning, dialogue, emotion
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Boson AI
Higgs Audio v2
Local TTS model · Q 9.7 · Speed 7
Alibaba FunAudioLLM
CosyVoice 2
Local TTS model · Q 9.3 · Speed 8.8
OpenBMB
VoxCPM2
Local TTS model · Q 9.4 · Speed 8.3
Nari Labs
Dia
Local TTS model · Q 9.3 · Speed 7
Microsoft Research
VibeVoice 1.5B
Local TTS model · Q 9.4 · Speed 6.5
Speech Research (SWivid)
F5-TTS v1.1
Local TTS model · Q 9.5 · Speed 9.2
Alibaba Cloud (Qwen Team)
Qwen3 TTS
Local TTS model · Q 9.5 · Speed 8.5
Zyphra
Zonos v0.1
Local TTS model · Q 9.5 · Speed 8.5