Hyperion/docs/benchmarking/run-your-own-benchmarks.md at main · Hyperion-HQ/Hyperion

title	Run Your Own Benchmarks
description	Step-by-step guide to benchmarking Hyperion Gateway locally using the bundled C benchmark suite.

Overview

We believe in radical transparency when it comes to performance. That's why the exact benchmarking tools we use are built directly into the Hyperion stack.

You don't need external testing rigs or bloated JMeter setups—the Hyperion Gateway Docker image ships with a lightning-fast, multi-threaded C libcurl benchmarking tool called cbenchmark pre-compiled and ready to go.

By running the benchmark inside the gateway container, we eliminate host operating system networking variations and Docker proxy overheads, giving you the purest measurement of the Gateway's internal routing and parsing latency.

Prerequisites

Before running benchmarks, ensure you have:

Docker & Docker Compose installed.
The Hyperion repo cloned locally.

Quick Start

1. Spin up the environment

Navigate to the root of the project and start the infrastructure. In addition to the standard stack, make sure to start the mock-openai service, which the gateway will proxy requests to:

# In the root terminal
docker compose up -d
docker compose up mock-openai -d

2. Run the Benchmark Script

We've provided a helper shell script that automatically prepares the test environment (flushing caches, seeding test API keys), builds the latest C binary, and executes it inside the container.

Navigate to the tools directory and run it:

cd gateway/tools
#ENSURE YOU HAVE SET THE REQUIRED ENV VARS LIKE CACHE MASTER SECRET
# Usage: ./benchmark.sh [TOTAL_REQUESTS] [CONCURRENCY]
./benchmark.sh 10000 10

Understanding the Results

When the test finishes, you will see a console output breaking down the latency percentiles.

The "Microsecond" Overhead Metric

--- Gateway Overhead (dispatch-json_parse-json_marshal) ---
Average: 10.6045us
Median:  5.0000us
p95:     14.0000us
p99:     125.0000us

This is the most critical block of the output.

Gateway Overhead specifically measures the time elapsed from the moment the Gateway receives the HTTP payload to the moment it begins dispatching the request to the upstream mock provider.
It includes: JSON Unmarshaling, Routing Logic, Auth Checks, and JSON Marshaling.
Why it matters: A p95 of 14µs means that for 95% of your requests, Hyperion is only adding 14 millionths of a second relative to querying OpenAI or Anthropic directly. This proves the caching and routing layers add effectively zero latency to your stack.

Advanced Testing Scenarios

Sequential Baseline Test

Test the absolute minimum processing overhead without Go-scheduler contention:

./benchmark.sh 10000 1

Golden Ratio Test

Test the optimal balance of massive throughput while maintaining sub-2ms p99 latency:

./benchmark.sh 10000 10

Maximum Capacity Test

Push the lightweight Go-routines to test the throughput ceiling of your local hardware:

./benchmark.sh 50000 100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Prerequisites

Quick Start

1. Spin up the environment

2. Run the Benchmark Script

Understanding the Results

The "Microsecond" Overhead Metric

Advanced Testing Scenarios

Sequential Baseline Test

Golden Ratio Test

Maximum Capacity Test

FilesExpand file tree

run-your-own-benchmarks.md

Latest commit

History

run-your-own-benchmarks.md

File metadata and controls

Overview

Prerequisites

Quick Start

1. Spin up the environment

2. Run the Benchmark Script

Understanding the Results

The "Microsecond" Overhead Metric

Advanced Testing Scenarios

Sequential Baseline Test

Golden Ratio Test

Maximum Capacity Test