Local speech app
Voicebox
Desktop app & orchestrator for local TTS - not a model. Provides a UI studio, voice profile management, and a local API. Generates audio via swappable backends (Qwen3 TTS, Kokoro, Piper, XTTS…). Think of it as a front-end shell that runs on top of your installed TTS models.
CPU friendly
local voice workflow orchestration
30 languages
MIT
Quality
9/10
Speed
9.5/10
Model size
0.05 GB
Voices
Depends on active backend
Can Voicebox run locally?
Voicebox is a local app layer that coordinates installed speech backends. Start with Download from github.com/jamiepine/voicebox.
MIT license. Still verify upstream usage notes before shipping.
Download from github.com/jamiepine/voicebox
Upstream source
streamingrealtimelow-latency
Audio profile
Best fit
Voicebox is best when you want a local UI or API layer over multiple speech engines.
Hardware: cpugpuapple
Model details
Type
Local speech app
Family
app
Latency
ultra-low
Formats
native-app
Languages
en, multilingual
Context
App layer - orchestrates TTS backends via local API
Install locally
01
Check runtimeConfirm the backend supports native-app on your machine.02
Install modelUse the upstream command or repository instructions.03
Test locallyRun a short private audio prompt before moving into production workflows.Download from github.com/jamiepine/voicebox
Good for
- local voice workflow orchestration
- CPU friendly local workflows
- streaming, realtime, low-latency
Watch before shipping
- Validate pronunciation, latency and artifacts with your own voice samples.
- Review the upstream license and acceptable-use notes.
- Benchmark on your target CPU, Apple Silicon or GPU setup.
Related TTS and speech models
hexgrad
Kokoro TTS
Local TTS model · Q 9.2 · Speed 9.8
Kyutai
Moshi
Local TTS model · Q 9 · Speed 9.5
Neuphonic
NeuTTS Air
Local TTS model · Q 9 · Speed 9.5
Microsoft Research
VibeVoice Realtime 0.5B
Local TTS model · Q 9.1 · Speed 9.2
OpenMOSS / MOSI.AI
MOSS-TTS-Nano
Local TTS model · Q 8.5 · Speed 9.7
Speech Research (SWivid)
F5-TTS v1.1
Local TTS model · Q 9.5 · Speed 9.2
Speech Research
F5-TTS
Local TTS model · Q 9.4 · Speed 9
Amphion Team
MaskGCT
Local TTS model · Q 9.4 · Speed 9