Skip to content

Conversation

@brandonpelfrey
Copy link
Contributor

@brandonpelfrey brandonpelfrey commented Jan 27, 2026

Summary

Adds support for distributing benchmark requests across multiple inference server endpoints. Users can now specify multiple --url flags to
enable load balancing across servers:

aiperf profile --model llama \
    --url http://server1:8000 \
    --url http://server2:8000 \
    --url http://server3:8000 \
    --request-rate 30

This enables horizontal scaling scenarios like multi-GPU benchmarking on a single node (multiple containers, each serving a different GPU) or
distributed inference across multiple servers. Requests are distributed using a configurable --url-strategy (currently round_robin). The URL
assignment happens at credit issuance time in the TimingManager, flows through the credit to workers, and the transport layer selects the
appropriate URL for each request. Server metrics are collected from all configured endpoints.

Key Changes

  • EndpointConfig: url field becomes urls list with backward-compatible url property
  • URLSamplingStrategyFactory + RoundRobinURLSampler: Extensible factory pattern for URL selection strategies with thread-safe round-robin implementation
  • Credit system: Added url_index field to propagate URL selection from TimingManager through to workers
  • ServerMetricsManager: Collects metrics from all configured URLs (deduplicated)
  • Single URL usage remains unchanged for backward compatibility

Testing

Functional testing conducted against two aiperf mock servers. It shows requests properly routed to the two servers in round-robin fashion.

aiperf-mock-server --port 8000 &
aiperf-mock-server --port 8001 &
...

$ aiperf profile --url localhost:8000 localhost:8001 --model Qwen/Qwen3-0.6B
...

$ cat profile_export.jsonl | jq '.trace_data.request_headers.Host'
"localhost:8000"
"localhost:8001"
"localhost:8000"
"localhost:8001"
"localhost:8000"
"localhost:8001"
"localhost:8000"
"localhost:8001"
"localhost:8000"
"localhost:8001"

cc @ajcasagrande

Summary by CodeRabbit

Release Notes

  • New Features

    • Multi-URL load balancing: distribute requests across multiple endpoints using the --url parameter; control distribution strategy with --url-strategy option (default: round-robin)
  • Documentation

    • Added comprehensive Multi-URL Load Balancing reference section to timing modes documentation
    • Updated CLI options documentation to reflect new URL configuration capabilities

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
@github-actions
Copy link

github-actions bot commented Jan 27, 2026

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8ab1c786dfacc3e39caec276b8b110935c244497

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8ab1c786dfacc3e39caec276b8b110935c244497

Last updated for commit: 8ab1c78Browse code

@github-actions github-actions bot added the feat label Jan 27, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 27, 2026

Walkthrough

Implements multi-URL load balancing support across the benchmarking framework. Introduces a URLSelectionStrategy enum with round-robin distribution, replacing single-endpoint configuration with multi-endpoint support. Adds URL sampling protocols, factory registration, and thread-safe request routing via indices throughout timing and transport layers.

Changes

Cohort / File(s) Summary
Documentation
docs/benchmark_modes/timing-modes-reference.md, docs/cli_options.md
Added CLI documentation for --url (list support) and --url-strategy options with round-robin default; expanded base URL descriptions to support multiple endpoints.
Configuration & Enums
src/aiperf/common/config/endpoint_config.py, src/aiperf/common/enums/plugin_enums.py, src/aiperf/common/enums/__init__.py
Introduced URLSelectionStrategy enum with ROUND_ROBIN member; changed EndpointConfig.url: strurls: list[str] with validator; added backward-compatible url property and url_selection_strategy field.
Protocols & Factories
src/aiperf/common/protocols.py, src/aiperf/common/factories.py
Added URLSamplingStrategyProtocol defining next_url_index() interface; created URLSamplingStrategyFactory for strategy instantiation; updated ZMQProxyFactory signature.
Core Models
src/aiperf/common/models/model_endpoint_info.py, src/aiperf/common/models/record_models.py
Changed EndpointInfo.base_url: strbase_urls: list[str] with accessors (base_url property, get_url(index) method); added `url_index: int
Credit & Routing
src/aiperf/credit/issuer.py, src/aiperf/credit/structs.py
Extended CreditIssuer with url_index_callback parameter; added optional url_index field to Credit struct to propagate multi-URL metadata.
Timing & URL Sampling
src/aiperf/timing/config.py, src/aiperf/timing/phase/runner.py, src/aiperf/timing/phase_orchestrator.py, src/aiperf/timing/url_samplers.py
Added urls and url_selection_strategy fields to TimingConfig; implemented RoundRobinURLSampler with thread-safe index rotation; wired URL sampler through PhaseOrchestratorPhaseRunnerCreditIssuer pipeline.
Transport & Worker
src/aiperf/transports/aiohttp_transport.py, src/aiperf/workers/worker.py, src/aiperf/server_metrics/manager.py
Updated URL construction to use endpoint_info.get_url(request_info.url_index) for load-balanced selection; propagated url_index from credit through request info; changed endpoint aggregation to handle multiple URLs.
Unit Tests
tests/unit/common/config/test_endpoint_config.py, tests/unit/common/models/test_endpoint_info.py, tests/unit/timing/test_timing_config.py, tests/unit/timing/test_url_samplers.py
Added comprehensive test coverage for multi-URL initialization, list/property semantics, strategy selection, backward compatibility, round-robin distribution, thread safety, and factory registration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Poem

🐰 Hop along through URLs galore,
Round-robin routes we now explore,
Load balancing with grace and care,
Threads spinning without despair,
One config file distributes fair! 🌐

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Support multiple --url Endpoints' clearly and concisely summarizes the main feature being added—support for multiple URL endpoints with the --url flag, which is the primary objective of this changeset.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 95.08197% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/aiperf/timing/phase_orchestrator.py 60.00% 1 Missing and 1 partial ⚠️
src/aiperf/server_metrics/manager.py 80.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@ajcasagrande ajcasagrande self-requested a review January 27, 2026 17:05
Copy link
Contributor

@ajcasagrande ajcasagrande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The design is clean and well integrated in current aiperf style. I have some items to change but not too much. Overall, well designed. Thanks!

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Copy link
Contributor

@ajcasagrande ajcasagrande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great job!

@ajcasagrande ajcasagrande enabled auto-merge (squash) January 27, 2026 21:41
@ajcasagrande ajcasagrande merged commit 6b500c5 into ai-dynamo:main Jan 27, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants