Local TTS model

GPT-SoVITS

Zero-shot voice cloning TTS combining GPT and SoVITS. Clone any voice from 5 seconds of audio. Extremely popular in the open-source community with 40K+ GitHub stars.

Apple Silicon ready text-to-speech generation 5 languages MIT
Quality
9.1/10
Speed
7/10
Model size
2 GB
Voices
Zero-shot cloning from 5s

Can GPT-SoVITS run locally?

GPT-SoVITS can generate speech locally for private voice workflows. Start with git clone https://github.com/RVC-Boss/GPT-SoVITS.

MIT license. Still verify upstream usage notes before shipping.

cloningmultilingualemotion

Audio profile

Quality
9.1
Speed
7
Local
8.4

Best fit

GPT-SoVITS is best for local voice cloning and expressive speech generation.

Hardware: gpuapple

Model details

Type
Local TTS model
Family
gptsovits
Latency
medium
Formats
pytorch
Languages
en, zh, ja, ko, yue
Context
GPT + SoVITS hybrid

Install locally

01
Check runtimeConfirm the backend supports pytorch on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
git clone https://github.com/RVC-Boss/GPT-SoVITS

Good for

  • text-to-speech generation
  • Apple Silicon ready local workflows
  • cloning, multilingual, emotion

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw