Local TTS model

OCTAVE 2

Second-gen emotion-aware speech-language model. Generates voice, style and personality from a text description alone - no reference audio required. Rich control over arousal, valence and speaking style. Research-first release.

GPU recommended text-to-speech generation 7 languages Hume Terms (research)
Quality
9.4/10
Speed
7.5/10
Model size
3.2 GB
Voices
Prompt-generated voices + style

Can OCTAVE 2 run locally?

OCTAVE 2 can generate speech locally for private voice workflows. Start with pip install hume.

Hume Terms (research) license. Review upstream restrictions before commercial use.

emotioncontrollablestreamingmultilingual

Audio profile

Quality
9.4
Speed
7.5
Local
8.6

Best fit

OCTAVE 2 is best for multilingual local speech generation.

Hardware: gpuapple

Model details

Type
Local TTS model
Family
octave
Latency
low
Formats
pytorchapi
Languages
en, de, fr, es, it, pt, ja
Context
Describe a speaker in natural language

Install locally

01
Check runtimeConfirm the backend supports pytorch, api on your machine.
02
Install modelUse the upstream command or repository instructions.
03
Test locallyRun a short private audio prompt before moving into production workflows.
pip install hume

Good for

  • text-to-speech generation
  • GPU recommended local workflows
  • emotion, controllable, streaming

Watch before shipping

  • Validate pronunciation, latency and artifacts with your own voice samples.
  • Review the upstream license and acceptable-use notes.
  • Benchmark on your target CPU, Apple Silicon or GPU setup.

Related TTS and speech models

CompareBrowse all TTS models Local AIBrowse LLM models macOS appGet LocalClaw