A high-performance load testing tool for OpenAI-compatible LLM APIs, written in Rust.
- OpenAI-compatible API support: Works with any OpenAI-compatible endpoint (vLLM, TGI, Ollama, etc.)
- Streaming metrics: Measures Time to First Token (TTFT), Inter-token Latency (ITL), and Time per Output Token (TPOT)
- TEE signature verification: Verify chat completions come from genuine Trusted Execution Environments (TEE) with signature verification and latency tracking
- Audio input/output testing: Test multimodal models with audio input (transcription) and audio output (TTS)
- Image generation testing: Benchmark image generation endpoints with metrics tracking
- Built-in scenarios: Pre-configured benchmarks included in the binary
- Provider comparison: Test the same model across multiple providers and compare results
- Detailed statistics: P50, P90, P95, P99, P100 percentiles for all metrics
genai-benchmark listgenai-benchmark describe near-vs-bedrockexport NEARAI_API_KEY=your-key
export AWS_BEARER_TOKEN_BEDROCK=your-token
genai-benchmark run near-vs-bedrockYou can also use a .env file in the current directory to set environment variables:
# .env
NEARAI_API_KEY=your-key
AWS_BEARER_TOKEN_BEDROCK=your-tokengenai-benchmark export near-vs-bedrock > my-benchmark.yaml
# Edit my-benchmark.yaml
genai-benchmark scenario my-benchmark.yamlTo enable TEE signature verification for a provider, add verify: true to the provider configuration in your scenario YAML:
providers:
- name: "NEAR AI"
base_url: "https://cloud-api.near.ai/v1"
api_key: "${NEARAI_API_KEY}"
verify: true # Enable TEE signature verificationWhen enabled, the benchmark will:
- Fetch the TEE signature for each chat completion from
/signature/{chat_id} - Track verification success/failure rates
- Measure and report verification latency separately
- Include verification time in the total request duration metrics
The verification results are displayed in the benchmark output with separate latency statistics.
Test multimodal models like Qwen3-Omni with audio:
# Test audio input (transcription)
genai-benchmark run audio-input
# Test audio output (text-to-speech)
genai-benchmark run audio-output
# Test both audio input and output
genai-benchmark run multimodalOr use CLI flags:
# Add test audio to chat requests
genai-benchmark --base-url https://cloud-api.near.ai/v1 --model Qwen/Qwen3-Omni-30B-A3B-Instruct --audio-input --verify
# Enable audio output (sets modalities: ["text", "audio"])
genai-benchmark --base-url https://cloud-api.near.ai/v1 --model Qwen/Qwen3-Omni-30B-A3B-Instruct --audio-output --verifyBenchmark image generation endpoints:
# Run the built-in image generation scenario
genai-benchmark run image-generation
# Or use CLI flags
genai-benchmark --base-url https://cloud-api.near.ai/v1 --model Qwen/Qwen-Image-2512 --image-generation --image-size 1024x1024 --verifyMultiple scenarios are provided to test different aspects of image generation throughput:
# Quick test (5 images, basic metrics)
genai-benchmark run image-generation
# High-throughput stress test (100 images, high concurrency)
genai-benchmark run image-generation-stress
# Sustained load test (200 images over extended period)
genai-benchmark run image-generation-sustained
# Smaller images for performance comparison (512x512)
genai-benchmark run image-generation-512
# Batch generation (4 images per request)
genai-benchmark run image-generation-batchImage generation metrics include:
- Total images generated
- Total/average image data size
- Mean and P95 generation time
- Images per second (throughput)
- Data throughput (MB/s)
- TEE signature verification status
Multi-phase benchmarks allow testing cache effectiveness with warmup and query phases:
# List available multi-phase scenarios
genai-benchmark list
# Run built-in multi-phase scenarios
genai-benchmark run same-doc-qa # Same document QA benchmark
genai-benchmark run multi-round-qa # Multi-round conversation QA
genai-benchmark run rag # RAG with quality metrics
genai-benchmark run long-doc-qa # Long document QA
genai-benchmark run multi-doc-qa # Multi-document QA
# Run a custom multi-phase scenario file
genai-benchmark multi-phase-scenario my-scenario.yamlMulti-phase scenarios support:
- Warmup phase: Prime the cache with initial requests
- Query phase: Measure cache hit rates and performance
- Cache metrics: Track cache effectiveness across providers
- Quality metrics: F1 and ROUGE-L scores for answer quality
- Provider comparison: Compare with and without cache systems
One-liner install:
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/nearai/genai-benchmark/releases/latest/download/genai-benchmark-installer.sh | shOr download pre-built binaries from Releases, or build from source:
cargo install --path .use genai_benchmark::{BenchmarkConfig, run_benchmark, load_dataset, DatasetConfig};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let config = BenchmarkConfig {
name: Some("My Test".to_string()),
base_url: "https://api.example.com/v1".to_string(),
api_key: "your-key".to_string(),
model: "gpt-4".to_string(),
max_tokens: 256,
concurrency: 5,
rps: 10.0,
timeout_secs: 300,
};
let dataset = DatasetConfig::Synthetic { seed: Some(42) };
let prompts = load_dataset(&dataset, 100).await?;
let result = run_benchmark(&config, prompts, 100).await?;
genai_benchmark::print_result(&result);
Ok(())
}MIT