docs(paper): publish 9-channel benchmark results by ternaus · Pull Request #65 · albumentations-team/benchmark

ternaus · 2026-05-11T14:33:24Z

Add curated 9-channel result snapshots, include 9-channel regimes in generated paper data, and render a 9-channel summary table in the README.

Summary by Sourcery

Publish 9-channel (image-9ch) benchmark snapshots and integrate them into the paper pipeline, README tables, and summary insights, while preparing configs and scaffolding for upcoming video benchmarks and improving GCP result syncing.

New Features:

Add curated 9-channel CPU/GPU micro and DataLoader benchmark snapshots and wire them into generated paper data and summary tables.
Render scenario benchmark tables in the README for RGB, 9-channel, and video regimes based on aggregated result CSVs.
Introduce production configs and test coverage for video (16-frame) CPU/GPU micro and DataLoader scenarios, plus a helper script to launch GCP jobs in the first available GPU zone.

Enhancements:

Extend paper data generation to handle all regimes via generic winner-count tables and new regime labels for 9-channel and video benchmarks.
Summarize unsupported and early-stopped 9-channel rows in the supplement and add a remaining-supplement checklist for 9-channel and video runs.
Broaden public paper-data artifacts to include per-regime CSV and pivot tables and update insights to reference wildcard pivot paths.
Improve GCP detached bootstrap to rsync results before writing terminal markers and adjust logging expectations in tests.

Tests:

Update configuration model tests for new 9-channel and video configs, including gcs_uri and clip_length handling, and add a regression test ensuring GCP bootstrap sync ordering.

Add curated 9-channel result snapshots, include 9-channel regimes in generated paper data, and render a 9-channel summary table in the README.

Add remaining production video and high-memory 9-channel cloud configs, document the supplement run plan, and make detached GCP runs sync results before writing terminal markers.

Replace the 9-channel coverage summary with generated transform-by-regime tables for RGB and 9-channel data, and add video regime wiring for future published snapshots.

sourcery-ai · 2026-05-11T14:33:32Z

Reviewer's Guide

Adds curated 9-channel (image-9ch) benchmark snapshots to the paper dataset, wires them into the figure/table generation pipeline (including new README scenario tables), extends regime handling to 9-channel and video scenarios, and improves GCP job orchestration and result syncing for paper runs.

File-Level Changes

Change	Details	Files
Publish 9-channel benchmark results into summary and insights docs, including winner counts and coverage tables.	Extend docs/paper_data/summary.md with 9-channel winner-count sections for micro and DataLoader regimes plus empty stubs for video regimes. Add 9-channel rows to the Coverage Summary and CPU-vs-GPU comparison tables. Update unsupported_and_early_stopped docs to include 9-channel kornia/torchvision failure and early-stop reasons.	`docs/paper_data/summary.md` `docs/paper_data/unsupported_and_early_stopped.md`
Generate README scenario benchmark tables from aggregated CSVs for RGB, 9-channel, and video regimes.	Introduce README_SCENARIO_TABLES and _readme_transform_tables_markdown() to derive per-transform best implementation per regime from all_results.csv. Patch README.md between markers using _write_figure_markdown to insert generated scenario tables (RGB and 9-channel filled, video placeholder when no data).	`README.md` `scripts/paper/generate_figures_and_insights.py`
Broaden regime support (9ch + video) across paper helpers and generation pipeline, and include new per-regime pivot/summary CSVs in the public data sync.	Extend REGIME_LABELS and REGIME_ORDER to include image9ch_* and video16f_* regimes and adjust implementation_label to infer device from regime suffix instead of hardcoding rgb_dataloader_gpu. Teach generate_paper_data to pull image9ch and video16f result patterns from GCS and to emit winner-count sections for all regimes defined in REGIME_LABELS. Sync all _pivot.csv plus _cpu.csv and *_gpu.csv artifacts from GENERATED into docs/paper_data during _sync_public_data().	`scripts/paper/common.py` `scripts/paper/generate_paper_data.py` `scripts/paper/generate_figures_and_insights.py`
Add configs and tests for 9-channel and video paper runs, including a high-memory Kornia-only 9ch CPU DataLoader config and flexible data fields in tests.	Remove kornia from prod_c4_9ch_dataloader_cpu.yaml (standard memory) and add prod_c4_highmem_9ch_dataloader_cpu_kornia.yaml targeting c4-highmem-16. Introduce four video configs (CPU/GPU micro and DataLoader) pointing to UCF101, with clip_length and media=video set appropriately. Relax test_config_models expectations to allow per-config gcs_uri, num_channels, and clip_length, and assert new 9ch/video configs exist with expected sizing parameters.	`configs/paper/prod_c4_9ch_dataloader_cpu.yaml` `configs/paper/prod_c4_highmem_9ch_dataloader_cpu_kornia.yaml` `configs/paper/prod_c4_video_micro_cpu.yaml` `configs/paper/prod_c4_video_dataloader_cpu.yaml` `configs/paper/prod_g2_video_micro_gpu.yaml` `configs/paper/prod_g2_video_dataloader_gpu.yaml` `tests/test_config_models.py`
Ensure detached GCP runs sync results to GCS before writing the DONE/FAILED markers and add coverage for this behavior.	Move gcs_rsync_retry of the results directory before marker upload in _BOOTSTRAP_SH and tighten error messaging to reflect terminal artifact upload, removing the post-marker rsync-with-warning path. Add a test to assert that result syncing occurs before marker writes and that the old stale warning string is gone.	`benchmark/cloud/gcp.py` `tests/test_jobs_orchestrator.py`
Add helper script to launch GPU jobs in the first GCP zone with available capacity.	Create run_gcp_first_available_gpu_zone.sh that loops over a prioritized zone list, invokes benchmark.cli run with --gcp-zone per zone, inspects logs for capacity-related failure strings, and falls back to the next zone or exits on non-capacity errors. Parameterize script via CONFIG, ZONES, LOG_DIR and propagate extra CLI args to benchmark.cli.	`scripts/run_gcp_first_available_gpu_zone.sh`
Publish curated 9-channel CPU/GPU micro and DataLoader manifests and raw result JSONs, plus regime-specific CSV and pivot tables.	Add results/published manifests for 9ch micro CPU/GPU and 9ch DataLoader CPU/GPU (standard and highmem) with benchmark params, system info, and pointers to per-library result JSONs. Check in the associated raw _results.json files for albumentationsx, torchvision, and kornia and generate per-regime CSV and _pivot.csv files for RGB, image9ch, and video16f in docs/paper_data/. Regenerate all_results.csv, summary.json, figure_winner_counts.csv, figure_winner_rows.csv, and unsupported_and_early_stopped.csv to incorporate the new regimes.	`results/published/paper-9ch-micro-c4-standard-16-2026-05-10/manifest.json` `results/published/paper-9ch-micro-gpu-g2-standard-16-2026-05-10/manifest.json` `results/published/paper-9ch-dataloader-memory-c4-standard-16-2026-05-10/manifest.json` `results/published/paper-9ch-dataloader-memory-c4-highmem-16-2026-05-11/manifest.json` `results/published/paper-9ch-dataloader-gpu-decode-g2-standard-16-2026-05-11/manifest.json` `results/published/paper-9ch-//_results.json` `docs/paper_data/all_results.csv` `docs/paper_data/summary.json` `docs/paper_data/figure_winner_counts.csv` `docs/paper_data/figure_winner_rows.csv` `docs/paper_data/image9ch_.csv` `docs/paper_data/image9ch__pivot.csv` `docs/paper_data/rgb__cpu.csv` `docs/paper_data/rgb__gpu.csv` `docs/paper_data/video16f_.csv` `docs/paper_data/video16f__pivot.csv`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

In implementation_label, using regime.endswith("_dataloader_gpu") will incorrectly label micro GPU regimes like *_micro_gpu as CPU; consider checking for "_gpu" in the regime string or maintaining an explicit GPU-regime set so both micro and DataLoader GPU cases are handled correctly.
The new run_gcp_first_available_gpu_zone.sh hardcodes a long zone list; if you expect to reuse this script, consider centralizing the default zone list (or reading it from an env/config file) to avoid having to update multiple places when GCP regions/zones change.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `implementation_label`, using `regime.endswith("_dataloader_gpu")` will incorrectly label micro GPU regimes like `*_micro_gpu` as CPU; consider checking for `"_gpu"` in the regime string or maintaining an explicit GPU-regime set so both micro and DataLoader GPU cases are handled correctly.
- The new `run_gcp_first_available_gpu_zone.sh` hardcodes a long zone list; if you expect to reuse this script, consider centralizing the default zone list (or reading it from an env/config file) to avoid having to update multiple places when GCP regions/zones change.

## Individual Comments

### Comment 1
<location path="scripts/paper/common.py" line_range="76-77" />
<code_context>


 def implementation_label(regime: str, library: str) -> str:
-    device = "GPU" if regime == "rgb_dataloader_gpu" else "CPU"
+    device = "GPU" if regime.endswith("_dataloader_gpu") else "CPU"
     return f"{LIBRARY_DISPLAY.get(library, library)} {device}"

</code_context>
<issue_to_address>
**issue (bug_risk):** GPU micro regimes are now mislabeled as CPU in implementation_label.

The new regimes like `rgb_micro_gpu`, `image9ch_micro_gpu`, and `video16f_micro_gpu` don’t match `regime.endswith("_dataloader_gpu")`, so they’ll be labeled as CPU despite running on GPU. If `implementation_label` is surfaced in user-facing tables, GPU micro benchmarks will appear as “Library CPU”. Consider a broader condition (e.g. `regime.endswith("_gpu")`) or an explicit mapping from regime to device.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14df97d7be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T14:37:16Z

+  if [[ -d "$WORKDIR/results" ]]; then
+    gcs_rsync_retry "results" 300 "$WORKDIR/results" "${run_prefix}/results/" || terminal_ok=0


Withhold DONE when result sync fails

When the final results rsync exhausts its retries, this line only flips terminal_ok=0; the function still proceeds to upload and confirm the DONE marker a few lines later. In a successful benchmark where GCS result syncing fails, downstream pollers that treat DONE as terminal success (for example scripts/run_gcp_rgb_micro_cpu_proxies.sh checks $prefix/DONE) can stop and consume an incomplete results/ tree even though the VM is kept for triage. Return before writing the marker, or only create/upload DONE after all prior terminal artifacts succeeded.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR publishes curated 9‑channel benchmark snapshots and updates the paper-data generation pipeline so the repository can render 9‑channel (and additional regime) summaries in the README and docs/paper_data/ outputs. It also tightens detached GCP run reliability by syncing results/ to GCS before writing the terminal DONE/FAILED marker.

Changes:

Add published 9‑channel result snapshots (CPU micro, GPU micro, CPU DataLoader, GPU DataLoader) and regenerate paper-data CSV/MD/JSON artifacts from them.
Extend paper-data/figure generation scripts to include 9‑channel regimes (and additional regime scaffolding) and patch a scenario summary table block into the README.
Update detached GCP bootstrap to rsync results before writing terminal markers, and add tests guarding the ordering.

Reviewed changes

Copilot reviewed 59 out of 65 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_jobs_orchestrator.py	Adds a regression test ensuring detached GCP bootstrap syncs results before writing `DONE/FAILED`.
tests/test_config_models.py	Updates expected paper production config specs; adds new video + Kornia highmem config expectations.
scripts/run_gcp_first_available_gpu_zone.sh	Adds a helper script to attempt launches across multiple GPU zones and stop on non-capacity failures.
scripts/paper/generate_paper_data.py	Adds new regimes to published snapshot selection and emits winner-count sections across all regimes.
scripts/paper/generate_figures_and_insights.py	Generates README “scenario benchmark tables” from `all_results.csv` and expands public data sync to include more regime CSVs.
scripts/paper/common.py	Adds regime labels/order entries for 9‑channel and video; tweaks `implementation_label` device naming logic.
benchmark/cloud/gcp.py	Adds `gcs_rsync_retry` and ensures terminal artifact upload syncs `results/` before writing marker files.
README.md	Injects generated “Scenario benchmark tables” (RGB + 9‑channel + video placeholder) into the paper figures section.
docs/paper_data/video16f_micro_gpu.csv	Adds a (currently empty) generated CSV header for the video GPU micro regime.
docs/paper_data/video16f_micro_gpu_pivot.csv	Adds an (empty) generated pivot header for the video GPU micro regime.
docs/paper_data/video16f_micro_cpu.csv	Adds a (currently empty) generated CSV header for the video CPU micro regime.
docs/paper_data/video16f_micro_cpu_pivot.csv	Adds an (empty) generated pivot header for the video CPU micro regime.
docs/paper_data/video16f_dataloader_gpu.csv	Adds a (currently empty) generated CSV header for the video GPU DataLoader regime.
docs/paper_data/video16f_dataloader_gpu_pivot.csv	Adds an (empty) generated pivot header for the video GPU DataLoader regime.
docs/paper_data/video16f_dataloader_cpu.csv	Adds a (currently empty) generated CSV header for the video CPU DataLoader regime.
docs/paper_data/video16f_dataloader_cpu_pivot.csv	Adds an (empty) generated pivot header for the video CPU DataLoader regime.
docs/paper_data/unsupported_and_early_stopped.md	Updates the supplement table with 9‑channel unsupported/early-stopped rows.
docs/paper_data/unsupported_and_early_stopped.csv	Updates the CSV backing the unsupported/early-stopped supplement with 9‑channel rows.
docs/paper_data/summary.md	Regenerates the paper summary markdown to include 9‑channel regimes (and additional empty sections).
docs/paper_data/summary.json	Regenerates the summary JSON to include 9‑channel regimes and placeholder video regimes.
docs/paper_data/rgb_micro_gpu.csv	Regenerates RGB micro GPU CSV data (now committed as generated output).
docs/paper_data/rgb_micro_gpu_pivot.csv	Regenerates RGB micro GPU pivot table.
docs/paper_data/rgb_micro_cpu_pivot.csv	Regenerates RGB micro CPU pivot table.
docs/paper_data/image9ch_micro_gpu.csv	Adds generated 9‑channel GPU micro CSV from the new published snapshots.
docs/paper_data/image9ch_micro_gpu_pivot.csv	Adds generated 9‑channel GPU micro pivot table.
docs/paper_data/image9ch_micro_cpu.csv	Adds generated 9‑channel CPU micro CSV from the new published snapshots.
docs/paper_data/image9ch_micro_cpu_pivot.csv	Adds generated 9‑channel CPU micro pivot table.
docs/paper_data/image9ch_dataloader_gpu.csv	Adds generated 9‑channel GPU DataLoader CSV from the new published snapshots.
docs/paper_data/image9ch_dataloader_gpu_pivot.csv	Adds generated 9‑channel GPU DataLoader pivot table.
docs/paper_data/image9ch_dataloader_cpu_pivot.csv	Adds generated 9‑channel CPU DataLoader pivot table.
docs/paper_data/figure_winner_rows.csv	Regenerates figure backing data to include 9‑channel winner rows.
docs/paper_data/figure_winner_counts.csv	Regenerates figure backing data to include 9‑channel winner counts.
docs/benchmark_scope.md	Adds a “Remaining Supplement Plan” section detailing 9‑channel/video follow-up runs and next steps.
configs/paper/prod_g2_video_micro_gpu.yaml	Adds a production video GPU micro config.
configs/paper/prod_g2_video_dataloader_gpu.yaml	Adds a production video GPU DataLoader config.
configs/paper/prod_c4_video_micro_cpu.yaml	Adds a production video CPU micro config.
configs/paper/prod_c4_video_dataloader_cpu.yaml	Adds a production video CPU DataLoader config.
configs/paper/prod_c4_highmem_9ch_dataloader_cpu_kornia.yaml	Adds a dedicated highmem Kornia-only 9‑channel CPU DataLoader config.
configs/paper/prod_c4_9ch_dataloader_cpu.yaml	Removes Kornia from the standard 9‑channel CPU DataLoader production config.
results/published/paper-9ch-micro-gpu-g2-standard-16-2026-05-10/manifest.json	Adds published manifest metadata for 9‑channel GPU micro snapshot.
results/published/paper-9ch-micro-c4-standard-16-2026-05-10/manifest.json	Adds published manifest metadata for 9‑channel CPU micro snapshot.
results/published/paper-9ch-dataloader-memory-c4-standard-16-2026-05-10/torchvision_memory_dataloader_augment_n10000_r1_w8_b128_results.json	Adds a published TorchVision 9‑channel CPU DataLoader result JSON snapshot.
results/published/paper-9ch-dataloader-memory-c4-standard-16-2026-05-10/manifest.json	Adds published manifest metadata for 9‑channel CPU DataLoader (partial) snapshot.
results/published/paper-9ch-dataloader-memory-c4-highmem-16-2026-05-11/manifest.json	Adds published manifest metadata for the Kornia-only highmem rerun slice.
results/published/paper-9ch-dataloader-gpu-decode-g2-standard-16-2026-05-11/manifest.json	Adds published manifest metadata for 9‑channel GPU DataLoader snapshot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 def implementation_label(regime: str, library: str) -> str:
-    device = "GPU" if regime == "rgb_dataloader_gpu" else "CPU"
+    device = "GPU" if regime.endswith("_dataloader_gpu") else "CPU"
    return f"{LIBRARY_DISPLAY.get(library, library)} {device}"


        _open_dataloader_leaderboard_table(rows),
        "",
        "## Winner Counts",
        "",
-        _best_library_table(rows, "rgb_micro_cpu"),
-        "",
-        _best_library_table(rows, "rgb_dataloader_cpu"),
-        "",
-        _best_library_table(rows, "rgb_micro_gpu"),
-        "",
-        _best_library_table(rows, "rgb_dataloader_gpu"),
+        *(section for regime in REGIME_LABELS for section in (_best_library_table(rows, regime), "")),
        "",


+    "image9ch_micro_cpu": [("paper-9ch-micro-c4-*", "latest")],
+    "image9ch_micro_gpu": [("paper-9ch-micro-gpu-g2-*", "latest")],
+    "image9ch_dataloader_cpu": [("paper-9ch-dataloader-memory-c4-*", "all")],
+    "image9ch_dataloader_gpu": [("paper-9ch-dataloader-gpu-decode-g2-*", "latest")],
+    "video16f_micro_cpu": [("paper-video-micro-c4-*", "latest")],
+    "video16f_micro_gpu": [("paper-video-micro-gpu-g2-*", "latest")],
+    "video16f_dataloader_cpu": [("paper-video-dataloader-memory-c4-*", "latest")],
+    "video16f_dataloader_gpu": [("paper-video-dataloader-gpu-g2-*", "latest")],


ternaus added 3 commits May 11, 2026 17:20

docs(paper): publish 9-channel benchmark results

306fca0

Add curated 9-channel result snapshots, include 9-channel regimes in generated paper data, and render a 9-channel summary table in the README.

chore(cloud): add production video run configs

61fca2c

Add remaining production video and high-memory 9-channel cloud configs, document the supplement run plan, and make detached GCP runs sync results before writing terminal markers.

docs(readme): use transform-level scenario tables

14df97d

Replace the 9-channel coverage summary with generated transform-by-regime tables for RGB and 9-channel data, and add video regime wiring for future published snapshots.

Copilot AI review requested due to automatic review settings May 11, 2026 14:33

Copilot started reviewing on behalf of ternaus May 11, 2026 14:34 View session

sourcery-ai Bot reviewed May 11, 2026

View reviewed changes

Comment thread scripts/paper/common.py Outdated

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

Copilot AI reviewed May 11, 2026

View reviewed changes

fix(cloud): withhold terminal markers on sync failure

59d8c90

ternaus merged commit 0f33b3d into main May 11, 2026
2 checks passed

ternaus deleted the codex/publish-9ch-results-readme branch May 11, 2026 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(paper): publish 9-channel benchmark results#65

docs(paper): publish 9-channel benchmark results#65
ternaus merged 4 commits into
mainfrom
codex/publish-9ch-results-readme

ternaus commented May 11, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented May 11, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if [[ -d "$WORKDIR/results" ]]; then
		gcs_rsync_retry "results" 300 "$WORKDIR/results" "${run_prefix}/results/" \|\| terminal_ok=0

Uh oh!

Conversation

ternaus commented May 11, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ternaus commented May 11, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 11, 2026 •

edited

Loading