Local TTS model
MaskGCT
Fully non-autoregressive TTS - no text-phone alignment needed. Achieves human parity on naturalness and similarity metrics. Incredibly fast inference.
Apple Silicon ready
text-to-speech generation
2 languages
MIT
Quality
9.4/10
Speed
9/10
Model size
2.8 GB
Voices
Reference-based cloning
Can MaskGCT run locally?
MaskGCT can generate speech locally for private voice workflows. Start with pip install maskgct.
MIT license. Still verify upstream usage notes before shipping.
pip install maskgct
Upstream source
cloningrealtimestreaming
Audio profile
Best fit
MaskGCT is best for local voice cloning and expressive speech generation.
Hardware: gpuapple
Model details
Type
Local TTS model
Family
maskgct
Latency
ultra-low
Formats
pytorchsafetensors
Languages
en, zh
Context
Non-autoregressive, human parity
Install locally
01
Check runtimeConfirm the backend supports pytorch, safetensors on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.pip install maskgct
Good for
- text-to-speech generation
- Apple Silicon ready local workflows
- cloning, realtime, streaming
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
Speech Research (SWivid)
F5-TTS v1.1
Local TTS model · Q 9.5 · Speed 9.2
Speech Research
F5-TTS
Local TTS model · Q 9.4 · Speed 9
Zyphra
Zonos v0.1
Local TTS model · Q 9.5 · Speed 8.5
Neuphonic
NeuTTS Air
Local TTS model · Q 9 · Speed 9.5
Alibaba FunAudioLLM
CosyVoice 2
Local TTS model · Q 9.3 · Speed 8.8
OpenBMB
VoxCPM2
Local TTS model · Q 9.4 · Speed 8.3
OpenMOSS / MOSI.AI
MOSS-TTS-Nano
Local TTS model · Q 8.5 · Speed 9.7
Fish Audio
Fish Speech
Local TTS model · Q 9 · Speed 8.5