diff --git a/docs/tutorials/audio.md b/docs/tutorials/audio.md new file mode 100644 index 000000000..03df1c52d --- /dev/null +++ b/docs/tutorials/audio.md @@ -0,0 +1,96 @@ + + +# Profile Audio Language Models with AIPerf + +AIPerf supports benchmarking Audio Language Models that process audio inputs with optional text prompts. + +This guide covers profiling audio models using OpenAI-compatible chat completions endpoints with vLLM. + +--- + +## Start a vLLM Server + +Launch the vLLM server with Qwen2-Audio-7B-Instruct: + + +```bash +docker pull vllm/vllm-openai:latest +docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest \ + --model Qwen/Qwen2-Audio-7B-Instruct \ + --trust-remote-code +``` + + + +Verify the server is ready: + + +```bash +curl -s http://localhost:8000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "Qwen/Qwen2-Audio-7B-Instruct", + "messages": [{"role": "user", "content": "Hello"}], + "max_tokens": 10 + }' | jq +``` + + +--- + +## Profile with Synthetic Audio + +AIPerf can generate synthetic audio for benchmarking: + + +```bash +aiperf profile \ + --model Qwen/Qwen2-Audio-7B-Instruct \ + --endpoint-type chat \ + --audio-length-mean 5.0 \ + --audio-format wav \ + --audio-sample-rates 16 \ + --streaming \ + --url localhost:8000 \ + --request-count 20 \ + --concurrency 4 +``` + + +To add text prompts alongside audio, include `--synthetic-input-tokens-mean 100` + +## Profile with Custom Input File + +Create a JSONL file with audio data and optional text prompts. + + +```bash +cat < inputs.jsonl +{"texts": ["Transcribe this audio."], "audios": ["wav,UklGRiIFAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0Yf4EAAD..."]} +{"texts": ["What is being said in this recording?"], "audios": ["mp3,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4Ljc2LjEwMAAAAAAAAAAA..."]} +{"texts": ["Summarize the main points from this audio."], "audios": ["wav,UklGRooGAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YWY..."]} +EOF +``` + + +The audio data format is: `{format},{base64_encoded_audio_data}` where: +- `format`: Either `wav` or `mp3` +- `base64_encoded_audio_data`: Base64-encoded audio file content + +Run AIPerf using the custom input file: + + +```bash +aiperf profile \ + --model Qwen/Qwen2-Audio-7B-Instruct \ + --endpoint-type chat \ + --input-file inputs.jsonl \ + --custom-dataset-type single_turn \ + --streaming \ + --url localhost:8000 \ + --request-count 3 +``` + \ No newline at end of file