Zonos-v0.1-Hebrew

Quick Start

To use:

Install uv (if not already installed):
```
pip install -U uv
```

Clone and setup:

git clone https://github.com/notmax123/Zonos-Hebrew.git
cd Zonos-Hebrew
uv sync --extra compile

Run the Gradio interface:
```
uv run gradio_interface.py
```
Or run a quick test:
```
uv run sample.py
```

The Gradio interface will be available at http://localhost:7860 with Hebrew text "שלום לכולם" pre-filled.

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.

Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

For more details and speech samples, check out our blog here

We also have a hosted version available at playground.zyphra.com/audio

🤗 Hugging Face Resources

Model Weights

The Hebrew-trained model weights are available on Hugging Face: notmax123/Zonos-Hebrew

To use the Hebrew model weights in your code, simply replace the model loading line:

# Instead of the original Zyphra models:
# model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device=device)

# Use the Hebrew-trained model:
model = Zonos.from_pretrained("notmax123/Zonos-Hebrew", device=device)

Online Demo

Try the model instantly in your browser without any installation: 🤗 Zonos-Hebrew Space

The Hugging Face Space provides an easy-to-use web interface where you can:

Input Hebrew text for speech synthesis
Upload reference audio for voice cloning
Adjust various parameters like speaking rate and emotions
Download the generated audio directly

Perfect for quick experiments or sharing with others who want to try the model without technical setup!

Docker installation

Prebuilt image (Docker Hub)

Use the prebuilt image from Docker Hub:

docker pull maxme123/zonos-hebrew:latest

# Run with GPU and expose Gradio on port 7860
docker run --gpus all --rm -p 7860:7860 --shm-size=2g maxme123/zonos-hebrew:latest

Gradio UI: http://localhost:7860
Default textbox text is set to "שלום לכולם".
Optional public link:

docker run --gpus all --rm -e GRADIO_SHARE=1 -p 7860:7860 --shm-size=2g maxme123/zonos-hebrew:latest

Build locally (this repository)

docker build -t maxme123/zonos-hebrew:latest .
docker run --gpus all --rm -p 7860:7860 --shm-size=2g maxme123/zonos-hebrew:latest

Docker Compose (optional)

docker compose up       # uses the included docker-compose.yml
# or
docker compose up -d    # run detached

GPU prerequisites

Install NVIDIA drivers and nvidia-container-toolkit
Verify GPU visibility:

nvidia-smi
docker run --gpus all --rm nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04 nvidia-smi

Zonos follows a straightforward architecture: text normalization and phonemization via eSpeak, followed by DAC token prediction through a transformer or hybrid backbone. An overview of the architecture can be seen below.

Usage

Python

import torch
import torchaudio
from zonos.model import Zonos
from zonos.conditioning import make_cond_dict
from zonos.utils import DEFAULT_DEVICE as device


model = Zonos.from_pretrained("notmax123/Zonos-Hebrew", device=device)

wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
speaker = model.make_speaker_embedding(wav, sampling_rate)

torch.manual_seed(421)

cond_dict = make_cond_dict(text="ירושלים יום טוב", speaker=speaker, language="he")
conditioning = model.prepare_conditioning(cond_dict)

codes = model.generate(conditioning)

wavs = model.autoencoder.decode(codes).cpu()
torchaudio.save("sample.wav", wavs[0], model.autoencoder.sampling_rate)

Gradio interface (recommended)

uv run gradio_interface.py
# python gradio_interface.py

This should produce a sample.wav file in your project root directory.

For repeated sampling we highly recommend using the gradio interface instead, as the minimal example needs to load the model every time it is run.

Features

Zero-shot TTS with voice cloning: Input desired text and a 10-30s speaker sample to generate high quality TTS output
Audio prefix inputs: Add text plus an audio prefix for even richer speaker matching. Audio prefixes can be used to elicit behaviours such as whispering which can otherwise be challenging to replicate when cloning from speaker embeddings
Multilingual support: Zonos-v0.1 supports English, Japanese, Chinese, French, and German
Audio quality and emotion control: Zonos offers fine-grained control of many aspects of the generated audio. These include speaking rate, pitch, maximum frequency, audio quality, and various emotions such as happiness, anger, sadness, and fear.
Fast: our model runs with a real-time factor of ~2x on an RTX 4090 (i.e. generates 2 seconds of audio per 1 second of compute time)
Gradio WebUI: Zonos comes packaged with an easy to use gradio interface to generate speech
Simple installation and deployment: Zonos can be installed and deployed simply using the docker file packaged with our repository.

Installation

System requirements

Operating System: Linux (preferably Ubuntu 22.04/24.04), macOS
GPU: 6GB+ VRAM, Hybrid additionally requires a 3000-series or newer Nvidia GPU

Note: Zonos can also run on CPU provided there is enough free RAM. However, this will be a lot slower than running on a dedicated GPU, and likely won't be sufficient for interactive use.

For experimental windows support check out this fork.

System dependencies

Zonos depends on the eSpeak library phonemization. You can install it on Ubuntu with the following command:

apt install -y espeak-ng # For Ubuntu
# brew install espeak-ng # For MacOS

Python dependencies

We highly recommend using a recent version of uv for installation. If you don't have uv installed, you can install it via pip: pip install -U uv.

Installing into a new uv virtual environment (recommended)

uv sync
uv sync --extra compile # optional but needed to run the hybrid
uv pip install -e .

Installing into the system/actived environment using uv

uv pip install -e .
uv pip install -e .[compile] # optional but needed to run the hybrid

Installing into the system/actived environment using pip

pip install -e .
pip install --no-build-isolation -e .[compile] # optional but needed to run the hybrid

Confirm that it's working

For convenience we provide a minimal example to check that the installation works:

uv run sample.py
# python sample.py

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
zonos		zonos
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
CONDITIONING_README.md		CONDITIONING_README.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
gradio_interface.py		gradio_interface.py
pyproject.toml		pyproject.toml
sample.py		sample.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Zonos-v0.1-Hebrew

Quick Start

For more details and speech samples, check out our blog here

We also have a hosted version available at playground.zyphra.com/audio

🤗 Hugging Face Resources

Model Weights

Online Demo

Docker installation

Prebuilt image (Docker Hub)

Build locally (this repository)

Docker Compose (optional)

GPU prerequisites

Usage

Python

Gradio interface (recommended)

Features

Installation

System requirements

System dependencies

Python dependencies

Installing into a new uv virtual environment (recommended)

Installing into the system/actived environment using uv

Installing into the system/actived environment using pip

Confirm that it's working

Docker installation

About

Uh oh!

Releases

Packages

Languages

License

maxmelichov/Zonos-Hebrew

Folders and files

Latest commit

History

Repository files navigation

Zonos-v0.1-Hebrew

Quick Start

For more details and speech samples, check out our blog here

We also have a hosted version available at playground.zyphra.com/audio

🤗 Hugging Face Resources

Model Weights

Online Demo

Docker installation

Prebuilt image (Docker Hub)

Build locally (this repository)

Docker Compose (optional)

GPU prerequisites

Usage

Python

Gradio interface (recommended)

Features

Installation

System requirements

System dependencies

Python dependencies

Installing into a new uv virtual environment (recommended)

Installing into the system/actived environment using uv

Installing into the system/actived environment using pip

Confirm that it's working

Docker installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages