Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
234 changes: 234 additions & 0 deletions examples/online_serving/stable_audio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
# Stable Audio Online Serving

Generate audio from text prompts using Stable Audio models via an OpenAI-compatible API endpoint.

## Features

- **OpenAI-compatible API**: Use `/v1/audio/speech` endpoint
- **Flexible control**: Adjust audio length, guidance scale, inference steps
- **Quality control**: Use negative prompts to avoid unwanted characteristics
- **Reproducible**: Set random seed for deterministic generation

## Quick Start

### 1. Start the Server

```bash
vllm-omni serve stabilityai/stable-audio-open-1.0 \
--host 0.0.0.0 \
--port 8000 \
--gpu-memory-utilization 0.9 \
--trust-remote-code \
--enforce-eager \
--omni
```

### 2. Generate Audio

#### Using curl

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "The sound of a cat purring",
"audio_length": 10.0
}' --output cat.wav
```

#### Using Python Client

```bash
python stable_audio_client.py \
--text "The sound of a cat purring" \
--audio_length 10.0 \
--output cat.wav
```

#### Using Bash Script

```bash
bash curl_examples.sh
```

## API Reference

### Endpoint

```
POST /v1/audio/speech
```

### Request Body

```json
{
"input": "Text description of the audio",
"audio_length": 10.0,
"audio_start": 0.0,
"negative_prompt": "Low quality",
"guidance_scale": 7.0,
"num_inference_steps": 100,
"seed": 42,
"response_format": "wav"
}
```

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `input` | string | **required** | Text prompt describing the audio to generate |
| `audio_length` | float | 10.0 | Audio duration in seconds (max ~47s for stable-audio-open-1.0) |
| `audio_start` | float | 0.0 | Audio start time in seconds |
| `negative_prompt` | string | null | Text describing what to avoid in generation |
| `guidance_scale` | float | 7.0 | Classifier-free guidance scale (higher = more adherence to prompt) |
| `num_inference_steps` | int | 100 | Number of denoising steps (higher = better quality, slower) |
| `seed` | int | null | Random seed for reproducibility |
| `response_format` | string | "wav" | Output format: wav, mp3, flac, pcm |

### Response

Returns audio data in the requested format (default: WAV).

## Usage Examples

### Basic Generation

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "The sound of ocean waves"
}' --output ocean.wav
```

### Custom Duration

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "A dog barking",
"audio_length": 5.0
}' --output dog_5s.wav
```

### High Quality with Negative Prompt

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "A piano playing a gentle melody",
"audio_length": 10.0,
"negative_prompt": "Low quality, distorted, noisy",
"guidance_scale": 8.0,
"num_inference_steps": 150
}' --output piano_hq.wav
```

### Reproducible Generation

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Thunder and rain sounds",
"audio_length": 15.0,
"seed": 42
}' --output thunder.wav
```

### Quick Generation (Fewer Steps)

For faster generation with slightly lower quality:

```bash
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Birds chirping in a forest",
"audio_length": 8.0,
"num_inference_steps": 50
}' --output birds_quick.wav
```

## Python Client Examples

### Simple Generation

```bash
python stable_audio_client.py \
--text "The sound of a cat purring"
```

### Custom Parameters

```bash
python stable_audio_client.py \
--text "Thunder and rain" \
--audio_length 15.0 \
--negative_prompt "Low quality" \
--guidance_scale 7.0 \
--num_inference_steps 100 \
--seed 42 \
--output thunder.wav
```

### Different Output Format

```bash
python stable_audio_client.py \
--text "Guitar playing" \
--response_format mp3 \
--output guitar.mp3
```

## Tips

1. **Audio Length**: Keep under 47 seconds for `stable-audio-open-1.0`
2. **Quality vs Speed**:
- 50 steps: Fast, decent quality
- 100 steps: Good balance (default)
- 150+ steps: High quality, slower
3. **Guidance Scale**:
- Lower (3-5): More creative/varied
- Default (7): Good balance
- Higher (10+): More literal to prompt
4. **Negative Prompts**: Use to avoid "Low quality", "distorted", "noisy", etc.
5. **Seeds**: Use same seed for reproducible results

## Performance

| Inference Steps | Quality | Speed | Use Case |
|----------------|---------|-------|----------|
| 50 | Good | Fast | Quick previews |
| 100 (default) | Very Good | Medium | Production |
| 150+ | Excellent | Slow | Final/critical audio |

## Troubleshooting

### Server not responding
- Check if server is running: `curl http://localhost:8000/health`
- Check server logs for errors

### Audio quality issues
- Increase `num_inference_steps` (e.g., 150)
- Add negative prompts: `"Low quality, distorted, noisy"`
- Increase `guidance_scale` for more prompt adherence

### Generation timeout
- Reduce `num_inference_steps`
- Reduce `audio_length`
- Check GPU memory with `nvidia-smi`

### Wrong audio length
- Ensure `audio_length` is within model limits (~47s max)
- Adjust `audio_start` if trimming is needed

## See Also

- [Offline Inference Example](../../offline_inference/text_to_audio/README.md)
- [Stable Audio Model Card](https://huggingface.co/stabilityai/stable-audio-open-1.0)
- [vLLM-Omni Documentation](https://github.com/vllm-project/vllm-omni)
54 changes: 54 additions & 0 deletions examples/online_serving/stable_audio/curl_examples.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/bin/bash
# Examples for using Stable Audio with curl via /v1/audio/speech endpoint

# Example 1: Simple request with default parameters
echo "Example 1: Simple request with default parameters"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "The sound audience clapping and cheering in a stadium"
}' --output stadium.wav

# Example 2: Request with custom audio_length
echo "Example 2: Custom audio length (5 seconds)"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "The sound of a dog barking",
"audio_length": 5.0
}' --output dog_5s.wav

# Example 3: Request with negative prompt for quality control
echo "Example 3: With negative prompt"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "A piano playing a gentle melody",
"audio_length": 10.0,
"negative_prompt": "Low quality, distorted, noisy"
}' --output piano.wav

# Example 4: Full control with all parameters
echo "Example 4: Full control (custom length, guidance, steps, seed)"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Thunder and rain sounds",
"audio_length": 15.0,
"negative_prompt": "Low quality",
"guidance_scale": 7.0,
"num_inference_steps": 100,
"seed": 42
}' --output thunder_rain.wav

# Example 5: Quick generation with fewer steps (faster but lower quality)
echo "Example 5: Quick generation (fewer steps)"
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Ocean waves crashing on a beach",
"audio_length": 8.0,
"num_inference_steps": 50
}' --output ocean.wav

echo "All examples completed!"
Loading
Loading