This project adds high-quality Text-to-Speech capabilities to ComfyUI using the Orpheus TTS model. Create natural-sounding voices with emotional expressions, multilingual support, and audio effects.
- 🎙️ High-quality, natural-sounding speech synthesis
- 🎭 Support for emotional expressions and paralinguistic elements
- 👥 Multiple voice options (tara, leah, jess, leo, dan, mia, zac, zoe, etc.)
- 📝 Long text handling with automatic chunking for consistent output
- 🎛️ Professional audio effects:
- Pitch shifting (-12 to +12 semitones)
- Speed adjustment (0.5x to 2.0x speed)
- Volume control with anti-clipping protection
- Audio normalization option
- Reverb with adjustable room size and amount
- Echo with configurable delay and decay
- 🌐 Optional support for private Hugging Face models
- 💻 Cross-platform: Works on Windows, Linux/WSL, and macOS
Clone this repository into your ComfyUI's custom_nodes
directory:
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-Orpheus-TTS.git
pip install torch numpy soundfile transformers huggingface_hub nltk snac
For WSL 2, you may need to install directly from the GitHub repository:
pip install git+https://github.com/hubertsiuzdak/snac.git
-
Download SoX for Windows from the official SourceForge page
- Download the
.exe
installer (e.g.,sox-14.4.2-win32.exe
)
- Download the
-
Run the installer:
- Follow the installation prompts
- Important: Note the installation directory (default is usually
C:\Program Files (x86)\sox-14-4-2\
)
-
No need to add to PATH - the extension uses the direct path to SoX
sudo apt-get update
sudo apt-get install sox
brew install sox
After installing all required components, restart ComfyUI to load the extension.
To access private models on Hugging Face, create a file named hf_config.json
in the extension directory and insert your HF Token KEY:
{
"token": "YOUR_HUGGING_FACE_TOKEN_HERE"
}
- Save the file and restart ComfyUI
Your token will be used to authenticate with Hugging Face when downloading models. This is only required if you're using private models or if you need higher rate limits.
Loads the required models for Orpheus TTS.
Inputs:
snac_model_path
(optional): Path to SNAC model (default: "hubertsiuzdak/snac_24khz")orpheus_model_path
(optional): Path to Orpheus model (default: "canopylabs/orpheus-3b-0.1-ft")
Outputs:
model
: Model reference to be passed to the generate node
Generates speech from text input.
Inputs:
model
: Model reference from the loader nodetext
: The text to convert to speechvoice
: Voice style to use (tara, leah, jess, leo, dan, mia, zac, zoe, etc.)language
(optional): Language for multilingual output (en, fr, es, etc.)max_chunk_size
(optional): Maximum chunk size for long text processing
Outputs:
audio
: Audio data to be passed to preview or effects nodes
Applies high-quality audio processing to the generated speech.
Inputs:
audio
: Audio data from the generate nodepitch_shift
: Semitone adjustment (-12 to +12)speed_factor
: Playback speed modifier (0.5x to 2.0x)sox_path
(optional): Custom path to SoX executablegain_db
(optional): Volume adjustment in decibelsuse_limiter
(optional): Enable/disable limiter for positive gainnormalize_audio
(optional): Enable/disable audio normalizationadd_reverb
(optional): Enable/disable reverb effectreverb_amount
(optional): Reverb intensityreverb_room_scale
(optional): Size of virtual spaceadd_echo
(optional): Enable/disable echo effectecho_delay
(optional): Time between echo repetitionsecho_decay
(optional): How quickly echo fades
Outputs:
audio
: Processed audio data
Looking at the README section you provided, I'll expand it to include information about the different element position options, including the new pipe feature:
You can add expressive elements to the speech by inserting these tags:
<laugh>
- Natural laughter<chuckle>
- Light, subtle laughter<sigh>
- Exhaling with emotion<cough>
- Clearing throat<sniffle>
- Subtle nasal sound<groan>
- Low, grumbling sound<yawn>
- Tired exhale<gasp>
- Sudden intake of breath
The Element Position dropdown provides different ways to add these paralinguistic elements to your text:
-
None - No automatic element insertion. You can manually type the element tags in your text where desired.
I can't believe it! <laugh> That's amazing!
-
Append - Automatically adds the selected element at the end of your text.
Input: "That's amazing!" Output: "That's amazing! <laugh>"
-
Prepend - Automatically adds the selected element at the beginning of your text.
Input: "I need to get back to work." Output: "<sigh> I need to get back to work."
-
Pipe - Replace pipe characters (|) in your text with the selected element. This gives you precise control over element placement.
Input: "I can't believe it! | That's the funniest thing | I've heard all day." Element: laugh Output: "I can't believe it! <laugh> That's the funniest thing <laugh> I've heard all day."
I can't believe it! <laugh> That's the funniest thing I've heard all day.
<sigh> But now I need to get back to work.
Input: "Did you hear that? | It's hilarious! | I can't stop laughing!"
Element: laugh
Result: "Did you hear that? <laugh> It's hilarious! <laugh> I can't stop laughing!"
<gasp> What was that? <pause> Did you hear something? <sigh> Maybe I'm just tired.
-
Gain Control: Use
gain_db
to increase or decrease volume without distortion- Positive values (0 to +20 dB): Increase volume with automatic clipping prevention
- Negative values (-20 to 0 dB): Decrease volume
- For best results with multiple effects, set gain last in your workflow
-
Normalization: Enable
normalize_audio
to automatically balance levels- Great for ensuring consistent volume across different voice samples
- Applied before other effects for best results
Reverb adds a sense of space to your audio. Here are some suggested settings:
- Small Room: reverb_amount = 20, reverb_room_scale = 25
- Medium Room: reverb_amount = 40, reverb_room_scale = 50
- Large Hall: reverb_amount = 70, reverb_room_scale = 80
- Cathedral: reverb_amount = 90, reverb_room_scale = 95
Echo creates repeating sound reflections. Good settings to try:
- Subtle Echo: echo_delay = 0.3, echo_decay = 0.3
- Moderate Echo: echo_delay = 0.5, echo_decay = 0.5
- Canyon Echo: echo_delay = 1.0, echo_decay = 0.7
- Phone Call: pitch_shift = 0, speed_factor = 1.0, add_reverb = True, reverb_amount = 10, reverb_room_scale = 10
- Radio Announcer: pitch_shift = -2, speed_factor = 0.9, add_reverb = True, reverb_amount = 20, gain_db = 3
- Stadium Announcement: pitch_shift = 0, speed_factor = 1.0, add_reverb = True, reverb_amount = 60, add_echo = True, echo_delay = 0.8
- Child Voice: pitch_shift = 4, speed_factor = 1.1, gain_db = 2
- Deep Voice: pitch_shift = -4, speed_factor = 0.9, gain_db = -2
- Add "Orpheus TTS Model Loader"
- Add "Orpheus TTS Generate"
- Connect the model loader's output to the generate node's input
- Enter your text and select voice options
- Connect to "Preview Audio" node to hear the result
- Add "Orpheus TTS Model Loader"
- Add "Orpheus TTS Generate"
- Add "Orpheus Audio Effects"
- Connect in sequence: Model Loader → Generate → Audio Effects → Preview Audio
- Adjust pitch shift and speed factor sliders
This extension has been tested and works on:
- Windows 10/11
- Linux (including WSL 2 on Windows)
- macOS
Different environments may require specific setup steps:
- SoX is automatically located in standard installation directories
- If installed elsewhere, provide the full path in the effects node
- Use
pip install git+https://github.com/hubertsiuzdak/snac.git
to ensure compatibility - SoX is automatically located through the system PATH
- Install SoX via Homebrew for best compatibility
If you encounter issues with SoX:
-
Verify the SoX path in the "Orpheus Audio Effects" node:
- Default:
C:\Program Files (x86)\sox-14-4-2\sox.exe
- If your installation is in a different location, provide the full path to sox.exe
- Default:
-
Check if SoX is installed correctly:
- Open Command Prompt
- Run
"C:\Program Files (x86)\sox-14-4-2\sox.exe" --version
- If you get an error, reinstall SoX
-
Verify SoX installation:
sox --version
-
If SoX is not found, install it:
sudo apt-get update sudo apt-get install sox
This extension uses the following models:
- Orpheus TTS: A powerful text-to-speech model developed by Canopy AI
- SNAC Codec: A high-quality neural audio codec for voice synthesis
This project uses models with their own licenses:
- Orpheus Model: Canopy AI's Orpheus-TTS
- SNAC Model: Hubert Siuzdak's model
Please consult these licenses for usage terms and restrictions.
- Original Orpheus TTS implementation by Canopy AI
- SoX audio processing library: SoX - Sound eXchange
- ComfyUI: ComfyUI