GenAI Benchmark

A high-performance load testing tool for OpenAI-compatible LLM APIs, written in Rust.

Features

OpenAI-compatible API support: Works with any OpenAI-compatible endpoint (vLLM, TGI, Ollama, etc.)
Streaming metrics: Measures Time to First Token (TTFT), Inter-token Latency (ITL), and Time per Output Token (TPOT)
TEE signature verification: Verify chat completions come from genuine Trusted Execution Environments (TEE) with signature verification and latency tracking
Audio input/output testing: Test multimodal models with audio input (transcription) and audio output (TTS)
Image generation testing: Benchmark image generation endpoints with metrics tracking
Built-in scenarios: Pre-configured benchmarks included in the binary
Provider comparison: Test the same model across multiple providers and compare results
Detailed statistics: P50, P90, P95, P99, P100 percentiles for all metrics

Quick Start

List available scenarios

genai-benchmark list

Describe a scenario (see config and required env vars)

genai-benchmark describe near-vs-bedrock

Run a scenario

export NEARAI_API_KEY=your-key
export AWS_BEARER_TOKEN_BEDROCK=your-token
genai-benchmark run near-vs-bedrock

You can also use a .env file in the current directory to set environment variables:

# .env
NEARAI_API_KEY=your-key
AWS_BEARER_TOKEN_BEDROCK=your-token

Export and customize a scenario

genai-benchmark export near-vs-bedrock > my-benchmark.yaml
# Edit my-benchmark.yaml
genai-benchmark scenario my-benchmark.yaml

TEE Signature Verification

To enable TEE signature verification for a provider, add verify: true to the provider configuration in your scenario YAML:

providers:
  - name: "NEAR AI"
    base_url: "https://cloud-api.near.ai/v1"
    api_key: "${NEARAI_API_KEY}"
    verify: true  # Enable TEE signature verification

When enabled, the benchmark will:

Fetch the TEE signature for each chat completion from /signature/{chat_id}
Track verification success/failure rates
Measure and report verification latency separately
Include verification time in the total request duration metrics

The verification results are displayed in the benchmark output with separate latency statistics.

Audio Input/Output Testing

Test multimodal models like Qwen3-Omni with audio:

# Test audio input (transcription)
genai-benchmark run audio-input

# Test audio output (text-to-speech)
genai-benchmark run audio-output

# Test both audio input and output
genai-benchmark run multimodal

Or use CLI flags:

# Add test audio to chat requests
genai-benchmark --base-url https://cloud-api.near.ai/v1 --model Qwen/Qwen3-Omni-30B-A3B-Instruct --audio-input --verify

# Enable audio output (sets modalities: ["text", "audio"])
genai-benchmark --base-url https://cloud-api.near.ai/v1 --model Qwen/Qwen3-Omni-30B-A3B-Instruct --audio-output --verify

Image Generation Testing

Benchmark image generation endpoints:

# Run the built-in image generation scenario
genai-benchmark run image-generation

# Or use CLI flags
genai-benchmark --base-url https://cloud-api.near.ai/v1 --model Qwen/Qwen-Image-2512 --image-generation --image-size 1024x1024 --verify

Image Generation Performance Scenarios

Multiple scenarios are provided to test different aspects of image generation throughput:

# Quick test (5 images, basic metrics)
genai-benchmark run image-generation

# High-throughput stress test (100 images, high concurrency)
genai-benchmark run image-generation-stress

# Sustained load test (200 images over extended period)
genai-benchmark run image-generation-sustained

# Smaller images for performance comparison (512x512)
genai-benchmark run image-generation-512

# Batch generation (4 images per request)
genai-benchmark run image-generation-batch

Image generation metrics include:

Total images generated
Total/average image data size
Mean and P95 generation time
Images per second (throughput)
Data throughput (MB/s)
TEE signature verification status

Multi-Phase Benchmarks

Multi-phase benchmarks allow testing cache effectiveness with warmup and query phases:

# List available multi-phase scenarios
genai-benchmark list

# Run built-in multi-phase scenarios
genai-benchmark run same-doc-qa      # Same document QA benchmark
genai-benchmark run multi-round-qa   # Multi-round conversation QA
genai-benchmark run rag              # RAG with quality metrics
genai-benchmark run long-doc-qa      # Long document QA
genai-benchmark run multi-doc-qa     # Multi-document QA

# Run a custom multi-phase scenario file
genai-benchmark multi-phase-scenario my-scenario.yaml

Multi-phase scenarios support:

Warmup phase: Prime the cache with initial requests
Query phase: Measure cache hit rates and performance
Cache metrics: Track cache effectiveness across providers
Quality metrics: F1 and ROUGE-L scores for answer quality
Provider comparison: Compare with and without cache systems

Installation

One-liner install:

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/nearai/genai-benchmark/releases/latest/download/genai-benchmark-installer.sh | sh

Or download pre-built binaries from Releases, or build from source:

cargo install --path .

Library Usage

use genai_benchmark::{BenchmarkConfig, run_benchmark, load_dataset, DatasetConfig};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let config = BenchmarkConfig {
        name: Some("My Test".to_string()),
        base_url: "https://api.example.com/v1".to_string(),
        api_key: "your-key".to_string(),
        model: "gpt-4".to_string(),
        max_tokens: 256,
        concurrency: 5,
        rps: 10.0,
        timeout_secs: 300,
    };

    let dataset = DatasetConfig::Synthetic { seed: Some(42) };
    let prompts = load_dataset(&dataset, 100).await?;

    let result = run_benchmark(&config, prompts, 100).await?;
    genai_benchmark::print_result(&result);

    Ok(())
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
datasets		datasets
examples		examples
scenarios		scenarios
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
dist-workspace.toml		dist-workspace.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI Benchmark

Features

Quick Start

List available scenarios

Describe a scenario (see config and required env vars)

Run a scenario

Export and customize a scenario

TEE Signature Verification

Audio Input/Output Testing

Image Generation Testing

Image Generation Performance Scenarios

Multi-Phase Benchmarks

Installation

Library Usage

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenAI Benchmark

Features

Quick Start

List available scenarios

Describe a scenario (see config and required env vars)

Run a scenario

Export and customize a scenario

TEE Signature Verification

Audio Input/Output Testing

Image Generation Testing

Image Generation Performance Scenarios

Multi-Phase Benchmarks

Installation

Library Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages