Local TTS model

MARS5

AR-diffusion hybrid TTS with near-human quality. Ultra-fast zero-shot voice cloning from a few seconds of audio. Unique hybrid architecture combining autoregressive and diffusion.

Apple Silicon ready text-to-speech generation 1 languages AGPL-3.0
Quality
9/10
Speed
7.5/10
Model size
2.5 GB
Voices
Zero-shot cloning

Can MARS5 run locally?

MARS5 can generate speech locally for private voice workflows. Start with pip install mars5-tts.

AGPL-3.0 license. Review upstream restrictions before commercial use.

cloningstreamingrealtime

Audio profile

Quality
9
Speed
7.5
Local
8.3

Best fit

MARS5 is best for local voice cloning and expressive speech generation.

Hardware: gpuapple

Model details

Type
Local TTS model
Family
mars
Latency
low
Formats
pytorch
Languages
en
Context
AR + Diffusion hybrid architecture

Install locally

01
Check runtimeConfirm the backend supports pytorch on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install mars5-tts

Good for

  • text-to-speech generation
  • Apple Silicon ready local workflows
  • cloning, streaming, realtime

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw