Feature/statistics#42
Merged
Merged
Conversation
Closed
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in benchmarking/stats pipeline to MMIRAGE to record per-shard runtime, throughput, token counts, and GPU utilization, and exposes the results via CLI and documentation.
Changes:
- Add
--statsflag to localrunand SLURMsubmit(and retry flows) to enable stats collection viaMMIRAGE_COLLECT_STATS. - Record per-shard
statsinto shardstatus.json(runtime/throughput, GPU util polling, token counts, model load time) and add a newmmirage statscommand to report per-shard + aggregate JSON. - Add documentation and a DataTrove-compatible benchmark config (
configs/config_benchmark_datatrove.yaml).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/mmirage/shard_utils.py | Introduces ShardStats, duration formatting, and a background nvidia-smi poller; persists stats into shard status payloads. |
| src/mmirage/shard_process.py | Enables opt-in GPU polling + token/load-time capture and writes stats on shard success. |
| src/mmirage/core/process/processors/llm/llm_processor.py | Tracks cumulative token counts and measures engine init time; supports forwarding extra engine kwargs. |
| src/mmirage/core/process/processors/llm/config.py | Adds extra_engine_args to allow passing additional SGLang Engine kwargs from YAML. |
| src/mmirage/core/process/mapper.py | Aggregates token counts and model load time across processors for shard-level stats. |
| src/mmirage/cli.py | Adds --stats to relevant commands and introduces a stats subcommand emitting JSON. |
| src/mmirage/cli_utils/status.py | Adds collect_bench_stats() to aggregate shard stats across runs; wires --stats into retry submission. |
| src/mmirage/cli_utils/slurm.py | Plumbs collect_stats through sbatch generation to export MMIRAGE_COLLECT_STATS=1. |
| README.md | Documents benchmarking workflow, metrics, and reference benchmark links. |
| configs/config_benchmark_datatrove.yaml | Adds a DataTrove-compatible throughput benchmark configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
fabnemEPFL
requested changes
May 8, 2026
fabnemEPFL
requested changes
May 11, 2026
fabnemEPFL
approved these changes
May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new benchmarking feature for MMIRAGE that enables detailed per-shard performance tracking, including GPU utilization and throughput metrics. The changes add a
--statsflag to the CLI for both local and SLURM runs, update documentation, and provide a DataTrove-compatible benchmark configuration. The most important changes are grouped below:Benchmarking and Performance Tracking:
--statsflag to therunandsubmitcommands, which enables GPU utilization polling and throughput tracking during shard execution. This is controlled via theMMIRAGE_COLLECT_STATSenvironment variable. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]statsCLI command to print per-shard and aggregate benchmark statistics in JSON format, using a newcollect_bench_statsutility. [1] [2] [3] [4]Documentation Updates:
README.mdwith a new section on benchmarking shard performance, including example commands, sample output, and explanations of key metrics.README.md.Configuration and Compatibility:
configs/config_benchmark_datatrove.yamlfor running a DataTrove-compatible throughput benchmark, with detailed instructions and settings matching the DataTrove inference benchmark.These changes provide users with tools to collect, inspect, and compare detailed runtime and hardware utilization statistics, facilitating performance analysis and benchmarking against industry standards like DataTrove.