Skip to content

fix(sa-bench): shard high-concurrency loadgen across frontends#154

Open
YAMY1234 wants to merge 4 commits into
NVIDIA:mainfrom
YAMY1234:yangminl/sa-bench-sharded-frontends
Open

fix(sa-bench): shard high-concurrency loadgen across frontends#154
YAMY1234 wants to merge 4 commits into
NVIDIA:mainfrom
YAMY1234:yangminl/sa-bench-sharded-frontends

Conversation

@YAMY1234
Copy link
Copy Markdown
Collaborator

@YAMY1234 YAMY1234 commented May 13, 2026

Summary

  • Add repeated --api-url support to SA-Bench and round-robin requests across multiple frontend endpoints while sharing one aiohttp session.
  • Teach bench.sh to populate direct frontend targets from SA_BENCH_API_URLS or /logs/nginx.conf when SA_BENCH_SHARD_FRONTENDS=true.
  • Pass benchmark.env through benchmark runners so recipes can enable sharded SA-Bench without using custom commands.

Why

High-concurrency disaggregated serving runs can exhaust ephemeral ports when a single load generator connects to one host:port tuple. Sharding request traffic across the deployed frontend endpoints avoids that client-side collapse without changing backend topology.

For conc=32768
Before
image

After
image

Validation

  • uv run ruff check src/srtctl/benchmarks/base.py src/srtctl/benchmarks/custom.py src/srtctl/benchmarks/scripts/sa-bench/backend_request_func.py src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py
  • uv run python -m py_compile src/srtctl/benchmarks/scripts/sa-bench/backend_request_func.py src/srtctl/benchmarks/scripts/sa-bench/benchmark_serving.py
  • uv run pytest tests/test_benchmarks.py tests/test_frontends.py
  • CoreWeave validation: job 3487, 6P1D DEP24, conc=36864, sharded SA-Bench, completed 110592/110592 with error=0 and api_urls=9.
  • CoreWeave validation: job 3528, 4P1D DEP24, conc=32768, sharded SA-Bench, completed 98304/98304 with error=0 and api_urls=9.

YAMY1234 added 3 commits May 13, 2026 16:05
Support direct multi-frontend request sharding to avoid single-destination ephemeral port exhaustion at high concurrency while keeping shared aiohttp sessions and nginx keepalive tuning.
Honor benchmark.env for all benchmark runners so sa-bench recipes can enable frontend sharding through environment flags.
Keep the sharded SA-Bench client changes passing the repository lint rules before opening the PR.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 13, 2026

Codecov Report

❌ Patch coverage is 87.50000% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@404ad25). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/srtctl/cli/mixins/benchmark_stage.py 81.81% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #154   +/-   ##
=======================================
  Coverage        ?   65.10%           
=======================================
  Files           ?       67           
  Lines           ?     8228           
  Branches        ?        0           
=======================================
  Hits            ?     5357           
  Misses          ?     2871           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@YAMY1234 YAMY1234 marked this pull request as ready for review May 15, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants