| title | Run Your Own Benchmarks |
|---|---|
| description | Step-by-step guide to benchmarking Hyperion Gateway locally using the bundled C benchmark suite. |
We believe in radical transparency when it comes to performance. That's why the exact benchmarking tools we use are built directly into the Hyperion stack.
You don't need external testing rigs or bloated JMeter setups—the Hyperion Gateway Docker image ships with a lightning-fast, multi-threaded C libcurl benchmarking tool called cbenchmark pre-compiled and ready to go.
By running the benchmark inside the gateway container, we eliminate host operating system networking variations and Docker proxy overheads, giving you the purest measurement of the Gateway's internal routing and parsing latency.
Before running benchmarks, ensure you have:
- Docker & Docker Compose installed.
- The Hyperion repo cloned locally.
Navigate to the root of the project and start the infrastructure. In addition to the standard stack, make sure to start the mock-openai service, which the gateway will proxy requests to:
# In the root terminal
docker compose up -d
docker compose up mock-openai -dWe've provided a helper shell script that automatically prepares the test environment (flushing caches, seeding test API keys), builds the latest C binary, and executes it inside the container.
Navigate to the tools directory and run it:
cd gateway/tools
#ENSURE YOU HAVE SET THE REQUIRED ENV VARS LIKE CACHE MASTER SECRET
# Usage: ./benchmark.sh [TOTAL_REQUESTS] [CONCURRENCY]
./benchmark.sh 10000 10When the test finishes, you will see a console output breaking down the latency percentiles.
--- Gateway Overhead (dispatch-json_parse-json_marshal) ---
Average: 10.6045us
Median: 5.0000us
p95: 14.0000us
p99: 125.0000us
This is the most critical block of the output.
Gateway Overheadspecifically measures the time elapsed from the moment the Gateway receives the HTTP payload to the moment it begins dispatching the request to the upstream mock provider.- It includes: JSON Unmarshaling, Routing Logic, Auth Checks, and JSON Marshaling.
- Why it matters: A p95 of 14µs means that for 95% of your requests, Hyperion is only adding 14 millionths of a second relative to querying OpenAI or Anthropic directly. This proves the caching and routing layers add effectively zero latency to your stack.
Test the absolute minimum processing overhead without Go-scheduler contention:
./benchmark.sh 10000 1Test the optimal balance of massive throughput while maintaining sub-2ms p99 latency:
./benchmark.sh 10000 10Push the lightweight Go-routines to test the throughput ceiling of your local hardware:
./benchmark.sh 50000 100