Skip to content

Commit 8eea4d1

Browse files
committed
v3.2.3
1 parent 2080cad commit 8eea4d1

File tree

9 files changed

+451
-556
lines changed

9 files changed

+451
-556
lines changed

CHANGELOG.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,72 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [3.2.3] - 2025-06-27
9+
10+
### 🔄 Enhanced OpenAI API Compatibility
11+
12+
This release consolidates the OpenAI-compatible API endpoints and introduces intelligent auto-combine functionality.
13+
14+
### ✨ Added
15+
16+
- **Auto-Combine Parameter**: New optional `auto_combine` parameter in `/v1/audio/speech` endpoint (default: `true`)
17+
- **Intelligent Text Handling**: Automatically detects long text and combines audio chunks when `auto_combine=true`
18+
- **Enhanced Error Messages**: Better error handling for long text when auto-combine is disabled
19+
- **Response Headers**: Added `X-Auto-Combine` and `X-Chunks-Combined` headers for transparency
20+
21+
### 🔄 Changed
22+
23+
- **Unified Endpoint**: Combined `/v1/audio/speech` and `/v1/audio/speech-combined` into single endpoint
24+
- **Backward Compatibility**: Maintains full OpenAI API compatibility while adding TTSFM-specific features
25+
- **Default Behavior**: Long text is now automatically split and combined by default (can be disabled)
26+
27+
### 🗑️ Removed
28+
29+
- **Deprecated Endpoint**: Removed `/v1/audio/speech-combined` endpoint (functionality moved to main endpoint)
30+
- **Legacy Web Options**: Removed confusing batch processing options from web interface for cleaner UX
31+
- **Complex UI Elements**: Simplified playground interface to focus on auto-combine
32+
33+
### 🧹 Streamlined Web Experience
34+
35+
- **User-Focused Design**: Web interface now emphasizes auto-combine as the primary approach
36+
- **Developer Features Preserved**: All advanced functionality remains in Python package
37+
- **Clear Separation**: Web for users, Python package for developers
38+
39+
### 📋 Migration Guide
40+
41+
- **No Breaking Changes**: Existing API calls continue to work unchanged
42+
- **Long Text**: Now automatically handled by default - no need to use separate endpoint
43+
- **Disable Auto-Combine**: Add `"auto_combine": false` to request body to get original behavior
44+
45+
## [3.2.3] - 2025-06-27
46+
47+
### 🔄 Enhanced OpenAI API Compatibility
48+
49+
This release consolidates the OpenAI-compatible API endpoints and introduces intelligent auto-combine functionality.
50+
51+
### ✨ Added
52+
53+
- **Auto-Combine Parameter**: New optional `auto_combine` parameter in `/v1/audio/speech` endpoint (default: `true`)
54+
- **Intelligent Text Handling**: Automatically detects long text and combines audio chunks when `auto_combine=true`
55+
- **Enhanced Error Messages**: Better error handling for long text when auto-combine is disabled
56+
- **Response Headers**: Added `X-Auto-Combine` and `X-Chunks-Combined` headers for transparency
57+
58+
### 🔄 Changed
59+
60+
- **Unified Endpoint**: Combined `/v1/audio/speech` and `/v1/audio/speech-combined` into single endpoint
61+
- **Backward Compatibility**: Maintains full OpenAI API compatibility while adding TTSFM-specific features
62+
- **Default Behavior**: Long text is now automatically split and combined by default (can be disabled)
63+
64+
### 🗑️ Removed
65+
66+
- **Deprecated Endpoint**: Removed `/v1/audio/speech-combined` endpoint (functionality moved to main endpoint)
67+
68+
### 📋 Migration Guide
69+
70+
- **No Breaking Changes**: Existing API calls continue to work unchanged
71+
- **Long Text**: Now automatically handled by default - no need to use separate endpoint
72+
- **Disable Auto-Combine**: Add `"auto_combine": false` to request body to get original behavior
73+
874
## [3.2.2] - 2025-06-26
975

1076
### 🎵 Combined Audio Functionality

README.md

Lines changed: 102 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ TTSFM provides both synchronous and asynchronous Python clients for text-to-spee
2424
- 🔧 **CLI Tool** - Command-line interface for quick TTS generation
2525
- 📦 **Type Hints** - Full type annotation support for better IDE experience
2626
- 🛡️ **Error Handling** - Comprehensive exception hierarchy with retry logic
27-
- 🔄 **Batch Processing** - Generate multiple audio files concurrently
27+
- **Auto-Combine** - Automatically handles long text with seamless audio combining
2828
- 📊 **Text Validation** - Automatic text length validation and splitting
2929

3030
## 📦 Installation
@@ -131,25 +131,30 @@ async def generate_speech():
131131
asyncio.run(generate_speech())
132132
```
133133

134-
#### Batch Processing
134+
#### Long Text Processing (Python Package)
135+
136+
For developers who need fine-grained control over text splitting:
135137

136138
```python
137-
from ttsfm import AsyncTTSClient, TTSRequest, Voice
139+
from ttsfm import TTSClient, Voice, AudioFormat
138140

139-
async def batch_generate():
140-
requests = [
141-
TTSRequest(input="First text", voice=Voice.ALLOY),
142-
TTSRequest(input="Second text", voice=Voice.ECHO),
143-
TTSRequest(input="Third text", voice=Voice.NOVA),
144-
]
141+
# Create client
142+
client = TTSClient()
145143

146-
async with AsyncTTSClient() as client:
147-
responses = await client.generate_speech_batch(requests)
144+
# Generate speech from long text (creates separate files for each chunk)
145+
responses = client.generate_speech_long_text(
146+
text="Very long text that exceeds 4096 characters...",
147+
voice=Voice.ALLOY,
148+
response_format=AudioFormat.MP3,
149+
max_length=2000,
150+
preserve_words=True
151+
)
148152

149-
for i, response in enumerate(responses):
150-
response.save_to_file(f"batch_output_{i}")
153+
# Save each chunk as separate files
154+
for i, response in enumerate(responses, 1):
155+
response.save_to_file(f"part_{i:03d}") # Saves as part_001.mp3, part_002.mp3, etc.
151156

152-
asyncio.run(batch_generate())
157+
print(f"Generated {len(responses)} audio files from long text")
153158
```
154159

155160
#### OpenAI Python Client Compatibility
@@ -173,6 +178,45 @@ response = client.audio.speech.create(
173178
response.stream_to_file("output.mp3")
174179
```
175180

181+
#### Auto-Combine Feature for Long Text
182+
183+
TTSFM automatically handles long text (>4096 characters) with the new auto-combine feature:
184+
185+
```python
186+
from openai import OpenAI
187+
188+
client = OpenAI(
189+
api_key="not-needed",
190+
base_url="http://localhost:8000/v1"
191+
)
192+
193+
# Long text is automatically split and combined into a single audio file
194+
long_article = """
195+
Your very long article or document content here...
196+
This can be thousands of characters long and TTSFM will
197+
automatically split it into chunks, generate audio for each,
198+
and combine them into a single seamless audio file.
199+
""" * 100 # Make it really long
200+
201+
# This works seamlessly - no manual splitting needed!
202+
response = client.audio.speech.create(
203+
model="gpt-4o-mini-tts",
204+
voice="nova",
205+
input=long_article,
206+
# auto_combine=True is the default
207+
)
208+
209+
response.stream_to_file("long_article.mp3") # Single combined file!
210+
211+
# Disable auto-combine for strict OpenAI compatibility
212+
response = client.audio.speech.create(
213+
model="gpt-4o-mini-tts",
214+
voice="nova",
215+
input="Short text only",
216+
auto_combine=False # Will error if text > 4096 chars
217+
)
218+
```
219+
176220
### 🖥️ Command Line Interface
177221

178222
```bash
@@ -364,7 +408,7 @@ When running the Docker container, these endpoints are available:
364408
### OpenAI-Compatible API
365409

366410
```bash
367-
# Generate speech
411+
# Generate speech (short text)
368412
curl -X POST http://localhost:8000/v1/audio/speech \
369413
-H "Content-Type: application/json" \
370414
-d '{
@@ -375,13 +419,46 @@ curl -X POST http://localhost:8000/v1/audio/speech \
375419
}' \
376420
--output speech.mp3
377421

422+
# Generate speech from long text with auto-combine (default behavior)
423+
curl -X POST http://localhost:8000/v1/audio/speech \
424+
-H "Content-Type: application/json" \
425+
-d '{
426+
"model": "gpt-4o-mini-tts",
427+
"input": "This is a very long text that exceeds the 4096 character limit...",
428+
"voice": "alloy",
429+
"response_format": "mp3",
430+
"auto_combine": true
431+
}' \
432+
--output long_speech.mp3
433+
434+
# Generate speech from long text without auto-combine (will return error if text > 4096 chars)
435+
curl -X POST http://localhost:8000/v1/audio/speech \
436+
-H "Content-Type: application/json" \
437+
-d '{
438+
"model": "gpt-4o-mini-tts",
439+
"input": "Your text here...",
440+
"voice": "alloy",
441+
"response_format": "mp3",
442+
"auto_combine": false
443+
}' \
444+
--output speech.mp3
445+
378446
# List models
379447
curl http://localhost:8000/v1/models
380448

381449
# Health check
382450
curl http://localhost:8000/api/health
383451
```
384452

453+
#### **New Parameter: `auto_combine`**
454+
455+
TTSFM extends the OpenAI API with an optional `auto_combine` parameter:
456+
457+
- **`auto_combine`** (boolean, optional, default: `true`)
458+
- When `true`: Automatically splits long text (>4096 chars) into chunks, generates audio for each chunk, and combines them into a single seamless audio file
459+
- When `false`: Returns an error if text exceeds the 4096 character limit (standard OpenAI behavior)
460+
- **Benefits**: No need to manually manage text splitting or audio file merging for long content
461+
385462
## 🐳 Docker Deployment
386463

387464
### Quick Start
@@ -522,7 +599,7 @@ docker run -p 8000:8000 ttsfm:local
522599

523600
- **Latency**: ~1-3 seconds for typical text (depends on openai.fm service)
524601
- **Throughput**: Supports concurrent requests with async client
525-
- **Text Limits**: Up to 4096 characters per request (configurable)
602+
- **Text Limits**: No limits with auto-combine! Handles text of any length automatically
526603
- **Audio Quality**: High-quality synthesis comparable to OpenAI
527604

528605
### Optimization Tips
@@ -555,16 +632,16 @@ for text in texts:
555632

556633
See [CHANGELOG.md](CHANGELOG.md) for detailed version history.
557634

558-
### Latest Changes (v3.2.2)
635+
### Latest Changes (v3.2.3)
559636

560-
- 🎵 **Combined Audio**: Generate single audio files from long text (no more chunk management!)
561-
- 🧠 **Intelligent Splitting**: Smart text splitting at sentence/word boundaries
562-
- 🔗 **Seamless Combination**: Professional audio merging with multiple fallback methods
563-
- 🤖 **OpenAI Compatible**: New `/v1/audio/speech-combined` endpoint
564-
- 📊 **Rich Metadata**: Detailed processing information in response headers
565-
- 🚀 **Performance Optimized**: Concurrent processing and memory efficiency
566-
- 🌍 **Unicode Support**: Full international text support
567-
- 🧪 **Comprehensive Testing**: Complete test suite with performance benchmarks
637+
- **Auto-Combine by Default**: Long text is now automatically split and combined into single audio files
638+
- 🔄 **Unified API Endpoint**: Single `/v1/audio/speech` endpoint handles both short and long text intelligently
639+
- 🎛️ **Configurable Behavior**: New `auto_combine` parameter (default: `true`) for full control
640+
- 🤖 **Enhanced OpenAI Compatibility**: Drop-in replacement with intelligent long-text handling
641+
- 📊 **Rich Response Headers**: `X-Auto-Combine`, `X-Chunks-Combined`, and processing metadata
642+
- 🧹 **Streamlined Web Interface**: Removed legacy batch processing for cleaner user experience
643+
- 📖 **Simplified Documentation**: Web docs emphasize modern auto-combine approach
644+
- 🎮 **Enhanced Playground**: Clean interface focused on auto-combine functionality
568645

569646
## 🤝 Support & Community
570647

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "ttsfm"
7-
version = "3.2.2"
7+
version = "3.2.3"
88
description = "Text-to-Speech API Client with OpenAI compatibility"
99
readme = "README.md"
1010
license = "MIT"

0 commit comments

Comments
 (0)