Skip to content

The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoints across different concurrency levels.

License

Notifications You must be signed in to change notification settings

xmg333/llmapibenchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM API Benchmark Tool

Overview

The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoints across different concurrency levels. This tool provides in-depth insights into API throughput, generation speed, and token processing capabilities.

Key Features

  • 🚀 Dynamic Concurrency Testing
  • 📊 Comprehensive Performance Metrics
  • 🔍 Flexible Configuration
  • 📝 Markdown Result Reporting
  • 🌐 Compatible with Any OpenAI-Like API
  • 📏 Arbitrary Length Dynamic Input Prompt

Performance Metrics Measured

  1. Generation Throughput

    • Measures tokens generated per second
    • Calculates across multiple concurrency levels
  2. Prompt Throughput

    • Analyzes input token processing speed
    • Helps understand API's prompt handling efficiency
  3. Time to First Token (TTFT)

    • Measures initial response latency
    • Provides both minimum and maximum TTFT
    • Critical for understanding real-time responsiveness

Example Output

Input Tokens: 45
Output Tokens: 512
Test Model: Qwen2.5-7B-Instruct-AWQ
Latency: 2.20 ms
Concurrency Generation Throughput (tokens/s) Prompt Throughput (tokens/s) Min TTFT (s) Max TTFT (s)
1 58.49 846.81 0.05 0.05
2 114.09 989.94 0.08 0.09
4 222.62 1193.99 0.11 0.15
8 414.35 1479.76 0.11 0.24
16 752.26 1543.29 0.13 0.47
32 653.94 1625.07 0.14 0.89

Usage

Minimal Configuration

Linux:

./llmapibenchmark_linux_amd64 --base-url https://your-api-endpoint.com/v1

Windows:

llmapibenchmark_windows_amd64.exe --base-url https://your-api-endpoint.com/v1

Full Configuration

Linux:

./llmapibenchmark_linux_amd64 \
  --base-url https://your-api-endpoint.com/v1 \
  --api-key YOUR_API_KEY \
  --model gpt-3.5-turbo \
  --concurrency 1,2,4,8,16 \
  --max-tokens 512 \
  --num-words 513 \
  --prompt "Your custom prompt here" \
  --format json

Windows:

llmapibenchmark_windows_amd64.exe ^
  --base-url https://your-api-endpoint.com/v1 ^
  --api-key YOUR_API_KEY ^
  --model gpt-3.5-turbo ^
  --concurrency 1,2,4,8,16 ^
  --max-tokens 512 ^
  --num-words 513 ^
  --prompt "Your custom prompt here" ^
  --format json

Command-Line Parameters

Parameter Short Description Default Required
--base-url -u Base URL for LLM API endpoint Empty (MUST be specified) Yes
--api-key -k API authentication key None No
--model -m Specific AI model to test Automatically discovers first available model No
--concurrency -c Comma-separated concurrency levels to test 1,2,4,8,16,32,64,128 No
--max-tokens -t Maximum tokens to generate per request 512 No
--num-words -n Number of words for random input prompt 0 No
--prompt -p Text prompt for generating responses A long story No
--format -f Output format (json, yaml) "" No
--help -h Show help message false No

Output

The tool provides output in multiple formats, controlled by the --format flag.

Default (CLI Table and Markdown File)

If no format is specified, the tool generates:

  1. Real-time console results: A table is displayed in the terminal with live updates.
  2. Markdown file: A detailed report is saved to API_Throughput_{ModelName}.md.

Markdown File Columns:

  • Concurrency: Number of concurrent requests
  • Generation Throughput: Tokens generated per second
  • Prompt Throughput: Input token processing speed
  • Min TTFT: Minimum time to first token
  • Max TTFT: Maximum time to first token

JSON Output (--format json)

When using the --format json flag, the results are printed to the console in JSON format.

YAML Output (--format yaml)

When using the --format yaml flag, the results are printed to the console in YAML format.

Best Practices

  • Test with various prompt lengths and complexities
  • Compare different models
  • Monitor for consistent performance
  • Be mindful of API rate limits
  • Use -numWords to control input length

Limitations

  • Requires active API connection
  • Results may vary based on network conditions
  • Does not simulate real-world complex scenarios

Disclaimer

This tool is for performance analysis and should be used responsibly in compliance with API provider's usage policies.

About

The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoints across different concurrency levels.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 100.0%