Local speech app

Voicebox

Q: Can Voicebox run locally?

Voicebox is listed by LocalClaw as a local speech app option. Hardware fit depends on runtime, model size and backend support.

Desktop app & orchestrator for local TTS - not a model. Provides a UI studio, voice profile management, and a local API. Generates audio via swappable backends (Qwen3 TTS, Kokoro, Piper, XTTS…). Think of it as a front-end shell that runs on top of your installed TTS models.

CPU friendly local voice workflow orchestration 30 languages MIT

Compare TTS models Open source page

Quality

9/10

Speed

9.5/10

Model size

0.05 GB

Voices

Depends on active backend

Can Voicebox run locally?

Voicebox is a local app layer that coordinates installed speech backends. Start with Download from github.com/jamiepine/voicebox.

MIT license. Still verify upstream usage notes before shipping.

Download from github.com/jamiepine/voicebox Upstream source

streamingrealtimelow-latency

Audio profile

Quality

Speed

9.5

Local

9.4

Best fit

Voicebox is best when you want a local UI or API layer over multiple speech engines.

Hardware: cpugpuapple

Model details

Type

Local speech app

Family

app

Latency

ultra-low

Formats

native-app

Languages

en, multilingual

Context

App layer - orchestrates TTS backends via local API

Install locally

Check runtimeConfirm the backend supports native-app on your machine.

Install modelUse the upstream command or repository instructions.

Test locallyRun a short private audio prompt before moving into production workflows.

Download from github.com/jamiepine/voicebox

Good for

local voice workflow orchestration
CPU friendly local workflows
streaming, realtime, low-latency

Watch before shipping

Validate pronunciation, latency and artifacts with your own voice samples.
Review the upstream license and acceptable-use notes.
Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw