Skip to content

feat: peak gen throughput metric in sa-bench + server-side node metrics CSV export#93

Open
zhengd-nv wants to merge 9 commits intoNVIDIA:mainfrom
zhengd-nv:node-metrics
Open

feat: peak gen throughput metric in sa-bench + server-side node metrics CSV export#93
zhengd-nv wants to merge 9 commits intoNVIDIA:mainfrom
zhengd-nv:node-metrics

Conversation

@zhengd-nv
Copy link
Copy Markdown

Summary

This PR adds two complementary gen-throughput measurement capabilities:

Client-side: peak gen throughput in sa-bench (benchmark_serving.py)

Adds a peak_output_tokens_per_s metric to align with sglang bench_serving.py's peak gen throughput reporting.

  • Records start_time and per-chunk text_chunks on each RequestFuncOutput across all backends (OpenAI completions, chat completions, TRT-LLM, Dynamo).
  • After the benchmark completes, reconstructs per-chunk absolute arrival times from start_time + ttft + cumulative ITL. Because sa-bench ITL is per SSE chunk (not per token), each chunk's text is tokenized to get an accurate token count.
  • Buckets token arrivals into 1-second windows, applies a 10-sample moving-average smoothing, and reports the peak as peak_output_tokens_per_s.
  • Printed alongside output_throughput and included in the JSON result.

Server-side: per-node batch metrics CSV export (analysis/srtlog)

Adds analysis/srtlog/export_node_metrics.py to extract batch-level metrics from prefill/decode Slurm logs and write them to CSV. Server-side data captures both running_req and gen_throughput per batch step, enabling more precise analysis of how gen throughput varies with concurrency — something the client-side metric cannot provide.

  • One CSV per node named {node}_{worker_type}_{worker_id}.csv; columns cover all batch fields (token usage, queue depth, throughput, etc.).
  • A gen_throughput.csv summary groups by running_req and reports count/mean/median of gen_throughput.
  • Can be run standalone: python -m analysis.srtlog.export_node_metrics <run_path>
  • Integrated into the postprocess pipeline via benchmark.export_node_metrics: true in the job config; runs in an ephemeral venv to avoid polluting the container environment.
  • NodeAnalyzer.parse_run_logs() now also scans <run_path>/logs/ (matching the actual srt-slurm job output layout).
  • RunMetadata.format_date() handles additional timestamp formats (%Y-%m-%d %H:%M:%S[.%f]).

Usage

  • Run sa-bench against a live endpoint; confirm Peak output token throughput (tok/s) appears in output and the value is plausible relative to Output token throughput. Verify peak_output_tokens_per_s is present in the JSON result file.
  • Run python -m analysis.srtlog.export_node_metrics <run_path> on a completed job directory; confirm per-node CSVs and gen_throughput.csv are created under logs/node_metrics/.
  • Set benchmark.export_node_metrics: true in a job config and confirm CSVs are written automatically at the end of a sweep.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 27.90698% with 31 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@1372a10). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/srtctl/cli/mixins/postprocess_stage.py 26.19% 31 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #93   +/-   ##
=======================================
  Coverage        ?   70.35%           
=======================================
  Files           ?       60           
  Lines           ?     6595           
  Branches        ?        0           
=======================================
  Hits            ?     4640           
  Misses          ?     1955           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ishandhanani
Copy link
Copy Markdown
Collaborator

ping me on slack when ready to merge

@zhengd-nv zhengd-nv marked this pull request as ready for review April 29, 2026 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants