-
Notifications
You must be signed in to change notification settings - Fork 25
feat: Support multiple --url Endpoints #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8ab1c786dfacc3e39caec276b8b110935c244497Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8ab1c786dfacc3e39caec276b8b110935c244497Last updated for commit: |
WalkthroughImplements multi-URL load balancing support across the benchmarking framework. Introduces a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
d3fab61 to
9534605
Compare
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
ajcasagrande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! The design is clean and well integrated in current aiperf style. I have some items to change but not too much. Overall, well designed. Thanks!
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
ajcasagrande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great job!
Summary
Adds support for distributing benchmark requests across multiple inference server endpoints. Users can now specify multiple --url flags to
enable load balancing across servers:
This enables horizontal scaling scenarios like multi-GPU benchmarking on a single node (multiple containers, each serving a different GPU) or
distributed inference across multiple servers. Requests are distributed using a configurable --url-strategy (currently round_robin). The URL
assignment happens at credit issuance time in the TimingManager, flows through the credit to workers, and the transport layer selects the
appropriate URL for each request. Server metrics are collected from all configured endpoints.
Key Changes
Testing
Functional testing conducted against two aiperf mock servers. It shows requests properly routed to the two servers in round-robin fashion.
cc @ajcasagrande
Summary by CodeRabbit
Release Notes
New Features
--urlparameter; control distribution strategy with--url-strategyoption (default: round-robin)Documentation
✏️ Tip: You can customize this high-level summary in your review settings.