LLM API Benchmark Tool

Overview

The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoints across different concurrency levels. This tool provides in-depth insights into API throughput, generation speed, and token processing capabilities.

Key Features

🚀 Dynamic Concurrency Testing
📊 Comprehensive Performance Metrics
🔍 Flexible Configuration
📝 Markdown Result Reporting
🌐 Compatible with Any OpenAI-Like API
📏 Arbitrary Length Dynamic Input Prompt

Performance Metrics Measured

Generation Throughput
- Measures tokens generated per second
- Calculates across multiple concurrency levels
Prompt Throughput
- Analyzes input token processing speed
- Helps understand API's prompt handling efficiency
Time to First Token (TTFT)
- Measures initial response latency
- Provides both minimum and maximum TTFT
- Critical for understanding real-time responsiveness

Example Output

Input Tokens: 45
Output Tokens: 512
Test Model: Qwen2.5-7B-Instruct-AWQ
Latency: 2.20 ms

Concurrency	Generation Throughput (tokens/s)	Prompt Throughput (tokens/s)	Min TTFT (s)	Max TTFT (s)
1	58.49	846.81	0.05	0.05
2	114.09	989.94	0.08	0.09
4	222.62	1193.99	0.11	0.15
8	414.35	1479.76	0.11	0.24
16	752.26	1543.29	0.13	0.47
32	653.94	1625.07	0.14	0.89

Usage

Quick Start Guide

Minimal Configuration

Linux:

./llmapibenchmark_linux_amd64 --base-url https://your-api-endpoint.com/v1

Windows:

llmapibenchmark_windows_amd64.exe --base-url https://your-api-endpoint.com/v1

Full Configuration

Linux:

./llmapibenchmark_linux_amd64 \
  --base-url https://your-api-endpoint.com/v1 \
  --api-key YOUR_API_KEY \
  --model gpt-3.5-turbo \
  --concurrency 1,2,4,8,16 \
  --max-tokens 512 \
  --num-words 513 \
  --prompt "Your custom prompt here" \
  --format json

Windows:

llmapibenchmark_windows_amd64.exe ^
  --base-url https://your-api-endpoint.com/v1 ^
  --api-key YOUR_API_KEY ^
  --model gpt-3.5-turbo ^
  --concurrency 1,2,4,8,16 ^
  --max-tokens 512 ^
  --num-words 513 ^
  --prompt "Your custom prompt here" ^
  --format json

Command-Line Parameters

Parameter	Short	Description	Default	Required
`--base-url`	`-u`	Base URL for LLM API endpoint	Empty (MUST be specified)	Yes
`--api-key`	`-k`	API authentication key	None	No
`--model`	`-m`	Specific AI model to test	Automatically discovers first available model	No
`--concurrency`	`-c`	Comma-separated concurrency levels to test	`1,2,4,8,16,32,64,128`	No
`--max-tokens`	`-t`	Maximum tokens to generate per request	`512`	No
`--num-words`	`-n`	Number of words for random input prompt	`0`	No
`--prompt`	`-p`	Text prompt for generating responses	A long story	No
`--format`	`-f`	Output format (json, yaml)	`""`	No
`--help`	`-h`	Show help message	`false`	No

Output

The tool provides output in multiple formats, controlled by the --format flag.

Default (CLI Table and Markdown File)

If no format is specified, the tool generates:

Real-time console results: A table is displayed in the terminal with live updates.
Markdown file: A detailed report is saved to API_Throughput_{ModelName}.md.

Markdown File Columns:

Concurrency: Number of concurrent requests
Generation Throughput: Tokens generated per second
Prompt Throughput: Input token processing speed
Min TTFT: Minimum time to first token
Max TTFT: Maximum time to first token

JSON Output (`--format json`)

When using the --format json flag, the results are printed to the console in JSON format.

YAML Output (`--format yaml`)

When using the --format yaml flag, the results are printed to the console in YAML format.

Best Practices

Test with various prompt lengths and complexities
Compare different models
Monitor for consistent performance
Be mindful of API rate limits
Use -numWords to control input length

Limitations

Requires active API connection
Results may vary based on network conditions
Does not simulate real-world complex scenarios

Disclaimer

This tool is for performance analysis and should be used responsibly in compliance with API provider's usage policies.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
cmd		cmd
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM API Benchmark Tool

Overview

Key Features

Performance Metrics Measured

Example Output

Usage

Quick Start Guide

Minimal Configuration

Full Configuration

Command-Line Parameters

Output

Default (CLI Table and Markdown File)

JSON Output (`--format json`)

YAML Output (`--format yaml`)

Best Practices

Limitations

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

xmg333/llmapibenchmark

Folders and files

Latest commit

History

Repository files navigation

LLM API Benchmark Tool

Overview

Key Features

Performance Metrics Measured

Example Output

Usage

Quick Start Guide

Minimal Configuration

Full Configuration

Command-Line Parameters

Output

Default (CLI Table and Markdown File)

JSON Output (--format json)

YAML Output (--format yaml)

Best Practices

Limitations

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

JSON Output (`--format json`)

YAML Output (`--format yaml`)

Packages