feat: Support multiple --url Endpoints #606

brandonpelfrey · 2026-01-27T01:58:53Z

Summary

Adds support for distributing benchmark requests across multiple inference server endpoints. Users can now specify multiple --url flags to
enable load balancing across servers:

aiperf profile --model llama \
    --url http://server1:8000 \
    --url http://server2:8000 \
    --url http://server3:8000 \
    --request-rate 30

This enables horizontal scaling scenarios like multi-GPU benchmarking on a single node (multiple containers, each serving a different GPU) or
distributed inference across multiple servers. Requests are distributed using a configurable --url-strategy (currently round_robin). The URL
assignment happens at credit issuance time in the TimingManager, flows through the credit to workers, and the transport layer selects the
appropriate URL for each request. Server metrics are collected from all configured endpoints.

Key Changes

EndpointConfig: url field becomes urls list with backward-compatible url property
URLSamplingStrategyFactory + RoundRobinURLSampler: Extensible factory pattern for URL selection strategies with thread-safe round-robin implementation
Credit system: Added url_index field to propagate URL selection from TimingManager through to workers
ServerMetricsManager: Collects metrics from all configured URLs (deduplicated)
Single URL usage remains unchanged for backward compatibility

Testing

Functional testing conducted against two aiperf mock servers. It shows requests properly routed to the two servers in round-robin fashion.

aiperf-mock-server --port 8000 &
aiperf-mock-server --port 8001 &
...

$ aiperf profile --url localhost:8000 localhost:8001 --model Qwen/Qwen3-0.6B
...

$ cat profile_export.jsonl | jq '.trace_data.request_headers.Host'
"localhost:8000"
"localhost:8001"
"localhost:8000"
"localhost:8001"
"localhost:8000"
"localhost:8001"
"localhost:8000"
"localhost:8001"
"localhost:8000"
"localhost:8001"

cc @ajcasagrande

Summary by CodeRabbit

Release Notes

New Features
- Multi-URL load balancing: distribute requests across multiple endpoints using the --url parameter; control distribution strategy with --url-strategy option (default: round-robin)
Documentation
- Added comprehensive Multi-URL Load Balancing reference section to timing modes documentation
- Updated CLI options documentation to reflect new URL configuration capabilities

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

github-actions · 2026-01-27T01:59:02Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8ab1c786dfacc3e39caec276b8b110935c244497

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8ab1c786dfacc3e39caec276b8b110935c244497

Last updated for commit: 8ab1c78 • Browse code

coderabbitai · 2026-01-27T02:05:37Z

Walkthrough

Implements multi-URL load balancing support across the benchmarking framework. Introduces a URLSelectionStrategy enum with round-robin distribution, replacing single-endpoint configuration with multi-endpoint support. Adds URL sampling protocols, factory registration, and thread-safe request routing via indices throughout timing and transport layers.

Changes

Cohort / File(s)	Summary
Documentation `docs/benchmark_modes/timing-modes-reference.md`, `docs/cli_options.md`	Added CLI documentation for `--url` (list support) and `--url-strategy` options with round-robin default; expanded base URL descriptions to support multiple endpoints.
Configuration & Enums `src/aiperf/common/config/endpoint_config.py`, `src/aiperf/common/enums/plugin_enums.py`, `src/aiperf/common/enums/__init__.py`	Introduced `URLSelectionStrategy` enum with `ROUND_ROBIN` member; changed `EndpointConfig.url: str` → `urls: list[str]` with validator; added backward-compatible `url` property and `url_selection_strategy` field.
Protocols & Factories `src/aiperf/common/protocols.py`, `src/aiperf/common/factories.py`	Added `URLSamplingStrategyProtocol` defining `next_url_index()` interface; created `URLSamplingStrategyFactory` for strategy instantiation; updated `ZMQProxyFactory` signature.
Core Models `src/aiperf/common/models/model_endpoint_info.py`, `src/aiperf/common/models/record_models.py`	Changed `EndpointInfo.base_url: str` → `base_urls: list[str]` with accessors (`base_url` property, `get_url(index)` method); added `url_index: int
Credit & Routing `src/aiperf/credit/issuer.py`, `src/aiperf/credit/structs.py`	Extended `CreditIssuer` with `url_index_callback` parameter; added optional `url_index` field to `Credit` struct to propagate multi-URL metadata.
Timing & URL Sampling `src/aiperf/timing/config.py`, `src/aiperf/timing/phase/runner.py`, `src/aiperf/timing/phase_orchestrator.py`, `src/aiperf/timing/url_samplers.py`	Added `urls` and `url_selection_strategy` fields to `TimingConfig`; implemented `RoundRobinURLSampler` with thread-safe index rotation; wired URL sampler through `PhaseOrchestrator` → `PhaseRunner` → `CreditIssuer` pipeline.
Transport & Worker `src/aiperf/transports/aiohttp_transport.py`, `src/aiperf/workers/worker.py`, `src/aiperf/server_metrics/manager.py`	Updated URL construction to use `endpoint_info.get_url(request_info.url_index)` for load-balanced selection; propagated `url_index` from credit through request info; changed endpoint aggregation to handle multiple URLs.
Unit Tests `tests/unit/common/config/test_endpoint_config.py`, `tests/unit/common/models/test_endpoint_info.py`, `tests/unit/timing/test_timing_config.py`, `tests/unit/timing/test_url_samplers.py`	Added comprehensive test coverage for multi-URL initialization, list/property semantics, strategy selection, backward compatibility, round-robin distribution, thread safety, and factory registration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Poem

🐰 Hop along through URLs galore,
Round-robin routes we now explore,
Load balancing with grace and care,
Threads spinning without despair,
One config file distributes fair! 🌐

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Support multiple --url Endpoints' clearly and concisely summarizes the main feature being added—support for multiple URL endpoints with the --url flag, which is the primary objective of this changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

codecov · 2026-01-27T04:28:46Z

Codecov Report

❌ Patch coverage is 95.08197% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/aiperf/timing/phase_orchestrator.py	60.00%	1 Missing and 1 partial ⚠️
src/aiperf/server_metrics/manager.py	80.00%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

ajcasagrande

Great work! The design is clean and well integrated in current aiperf style. I have some items to change but not too much. Overall, well designed. Thanks!

docs/benchmark_modes/timing-modes-reference.md

src/aiperf/common/config/endpoint_config.py

src/aiperf/common/factories.py

src/aiperf/common/protocols.py

src/aiperf/credit/issuer.py

src/aiperf/timing/url_samplers.py

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

ajcasagrande

LGTM. Great job!

brandonpelfrey added 2 commits January 26, 2026 20:10

Support multiple --url endpoints.

40256ba

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

update docs to support new url settings

201f026

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

github-actions bot added the feat label Jan 27, 2026

pre-commit fixes

9534605

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

brandonpelfrey force-pushed the multi-url branch from d3fab61 to 9534605 Compare January 27, 2026 02:17

fix precommit unit test failures

8fe4fb5

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

ajcasagrande self-requested a review January 27, 2026 17:05

ajcasagrande reviewed Jan 27, 2026

View reviewed changes

brandonpelfrey added 5 commits January 27, 2026 18:08

Move multi url doc to tutorial withlink

071378b

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

Rename to URL_STRATEGY

ddf1ee6

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

consistent naming for factory/protocol

5040cd6

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

pass url selection protocol object directly

21f1628

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

remove locking

b560e2a

Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>

brandonpelfrey requested a review from ajcasagrande January 27, 2026 19:30

ajcasagrande approved these changes Jan 27, 2026

View reviewed changes

Merge branch 'main' into multi-url

8ab1c78

ajcasagrande enabled auto-merge (squash) January 27, 2026 21:41

ajcasagrande merged commit 6b500c5 into ai-dynamo:main Jan 27, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support multiple --url Endpoints #606

feat: Support multiple --url Endpoints #606

brandonpelfrey commented Jan 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

ajcasagrande left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ajcasagrande left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Support multiple --url Endpoints #606

feat: Support multiple --url Endpoints #606

Conversation

brandonpelfrey commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

codecov bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ajcasagrande left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ajcasagrande left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brandonpelfrey commented Jan 27, 2026 •

edited

Loading

github-actions bot commented Jan 27, 2026 •

edited

Loading

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

codecov bot commented Jan 27, 2026 •

edited

Loading