Skip to content

MaurerAnton/zonos2.cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zonos2.cpp — Pure C++ Zonos2 TTS

nearly 100% C++20 implementation of Zyphra/ZONOS2.

28-layer autoregressive transformer with Mixture of Experts for text-to-speech. No Python runtime dependency — generates DAC codec tokens directly.

Architecture

UTF-8 bytes → Multi-Embedding → 28-layer Transformer (MoE 3–26)
  → 9-codebook DAC tokens (1024 vocab each)
  → (external DAC → 44.1 kHz PCM)
  • 28 layers, 2048-dim, 128 head_dim, 16 Q / 4 KV (GQA)
  • MoE: 16 experts, top-1, SonicMoE interleaved gate/up
  • EDA router, QK RMSNorm + temperature, headwise gating
  • Interleaved RoPE, SwiGLU FFN, logit softcap (tanh 15.0)
  • Deterministic: same seed + same text = byte-identical tokens

Build

mkdir build && cd build
cmake .. && make -j$(nproc)

Requires: ggml, C++20 compiler. No Python at runtime.

Usage

1. Extract weights (one-time preprocessing)

python3 scripts/extract_weights.py model.pth weights/

Converts the PyTorch checkpoint (BF16, 15 GB) to raw F32 binaries (~29 GB).

2. Generate speech

./build/zonos2_cli --model-dir weights/ --seed 42 --output out "Hello world"

Output: DAC codec tokens saved as out.codes.bin.

3. Decode to audio (external DAC)

python3 scripts/dac_decode.py out.codes.bin out.wav
# or: ffmpeg -f f32le -ar 44100 -ac 1 -i out.pcm out.wav

Verification

# Same seed + same text = same tokens
./build/zonos2_cli --model-dir weights/ --seed 42 --output a "Test"
./build/zonos2_cli --model-dir weights/ --seed 42 --output b "Test"
md5sum a.codes.bin b.codes.bin  # identical

References

License

Apache 2.0

About

Pure C++20 Zonos2 TTS — 28-layer MoE transformer

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors