Skip to content

docs(paper): publish 9-channel benchmark results#65

Merged
ternaus merged 4 commits into
mainfrom
codex/publish-9ch-results-readme
May 11, 2026
Merged

docs(paper): publish 9-channel benchmark results#65
ternaus merged 4 commits into
mainfrom
codex/publish-9ch-results-readme

Conversation

@ternaus
Copy link
Copy Markdown
Contributor

@ternaus ternaus commented May 11, 2026

Add curated 9-channel result snapshots, include 9-channel regimes in generated paper data, and render a 9-channel summary table in the README.

Summary by Sourcery

Publish 9-channel (image-9ch) benchmark snapshots and integrate them into the paper pipeline, README tables, and summary insights, while preparing configs and scaffolding for upcoming video benchmarks and improving GCP result syncing.

New Features:

  • Add curated 9-channel CPU/GPU micro and DataLoader benchmark snapshots and wire them into generated paper data and summary tables.
  • Render scenario benchmark tables in the README for RGB, 9-channel, and video regimes based on aggregated result CSVs.
  • Introduce production configs and test coverage for video (16-frame) CPU/GPU micro and DataLoader scenarios, plus a helper script to launch GCP jobs in the first available GPU zone.

Enhancements:

  • Extend paper data generation to handle all regimes via generic winner-count tables and new regime labels for 9-channel and video benchmarks.
  • Summarize unsupported and early-stopped 9-channel rows in the supplement and add a remaining-supplement checklist for 9-channel and video runs.
  • Broaden public paper-data artifacts to include per-regime CSV and pivot tables and update insights to reference wildcard pivot paths.
  • Improve GCP detached bootstrap to rsync results before writing terminal markers and adjust logging expectations in tests.

Tests:

  • Update configuration model tests for new 9-channel and video configs, including gcs_uri and clip_length handling, and add a regression test ensuring GCP bootstrap sync ordering.

ternaus added 3 commits May 11, 2026 17:20
Add curated 9-channel result snapshots, include 9-channel regimes in generated paper data, and render a 9-channel summary table in the README.
Add remaining production video and high-memory 9-channel cloud configs, document the supplement run plan, and make detached GCP runs sync results before writing terminal markers.
Replace the 9-channel coverage summary with generated transform-by-regime tables for RGB and 9-channel data, and add video regime wiring for future published snapshots.
Copilot AI review requested due to automatic review settings May 11, 2026 14:33
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 11, 2026

Reviewer's Guide

Adds curated 9-channel (image-9ch) benchmark snapshots to the paper dataset, wires them into the figure/table generation pipeline (including new README scenario tables), extends regime handling to 9-channel and video scenarios, and improves GCP job orchestration and result syncing for paper runs.

File-Level Changes

Change Details Files
Publish 9-channel benchmark results into summary and insights docs, including winner counts and coverage tables.
  • Extend docs/paper_data/summary.md with 9-channel winner-count sections for micro and DataLoader regimes plus empty stubs for video regimes.
  • Add 9-channel rows to the Coverage Summary and CPU-vs-GPU comparison tables.
  • Update unsupported_and_early_stopped docs to include 9-channel kornia/torchvision failure and early-stop reasons.
docs/paper_data/summary.md
docs/paper_data/unsupported_and_early_stopped.md
Generate README scenario benchmark tables from aggregated CSVs for RGB, 9-channel, and video regimes.
  • Introduce README_SCENARIO_TABLES and _readme_transform_tables_markdown() to derive per-transform best implementation per regime from all_results.csv.
  • Patch README.md between markers using _write_figure_markdown to insert generated scenario tables (RGB and 9-channel filled, video placeholder when no data).
README.md
scripts/paper/generate_figures_and_insights.py
Broaden regime support (9ch + video) across paper helpers and generation pipeline, and include new per-regime pivot/summary CSVs in the public data sync.
  • Extend REGIME_LABELS and REGIME_ORDER to include image9ch_* and video16f_* regimes and adjust implementation_label to infer device from regime suffix instead of hardcoding rgb_dataloader_gpu.
  • Teach generate_paper_data to pull image9ch and video16f result patterns from GCS and to emit winner-count sections for all regimes defined in REGIME_LABELS.
  • Sync all *_pivot.csv plus *_cpu.csv and *_gpu.csv artifacts from GENERATED into docs/paper_data during _sync_public_data().
scripts/paper/common.py
scripts/paper/generate_paper_data.py
scripts/paper/generate_figures_and_insights.py
Add configs and tests for 9-channel and video paper runs, including a high-memory Kornia-only 9ch CPU DataLoader config and flexible data fields in tests.
  • Remove kornia from prod_c4_9ch_dataloader_cpu.yaml (standard memory) and add prod_c4_highmem_9ch_dataloader_cpu_kornia.yaml targeting c4-highmem-16.
  • Introduce four video configs (CPU/GPU micro and DataLoader) pointing to UCF101, with clip_length and media=video set appropriately.
  • Relax test_config_models expectations to allow per-config gcs_uri, num_channels, and clip_length, and assert new 9ch/video configs exist with expected sizing parameters.
configs/paper/prod_c4_9ch_dataloader_cpu.yaml
configs/paper/prod_c4_highmem_9ch_dataloader_cpu_kornia.yaml
configs/paper/prod_c4_video_micro_cpu.yaml
configs/paper/prod_c4_video_dataloader_cpu.yaml
configs/paper/prod_g2_video_micro_gpu.yaml
configs/paper/prod_g2_video_dataloader_gpu.yaml
tests/test_config_models.py
Ensure detached GCP runs sync results to GCS before writing the DONE/FAILED markers and add coverage for this behavior.
  • Move gcs_rsync_retry of the results directory before marker upload in _BOOTSTRAP_SH and tighten error messaging to reflect terminal artifact upload, removing the post-marker rsync-with-warning path.
  • Add a test to assert that result syncing occurs before marker writes and that the old stale warning string is gone.
benchmark/cloud/gcp.py
tests/test_jobs_orchestrator.py
Add helper script to launch GPU jobs in the first GCP zone with available capacity.
  • Create run_gcp_first_available_gpu_zone.sh that loops over a prioritized zone list, invokes benchmark.cli run with --gcp-zone per zone, inspects logs for capacity-related failure strings, and falls back to the next zone or exits on non-capacity errors.
  • Parameterize script via CONFIG, ZONES, LOG_DIR and propagate extra CLI args to benchmark.cli.
scripts/run_gcp_first_available_gpu_zone.sh
Publish curated 9-channel CPU/GPU micro and DataLoader manifests and raw result JSONs, plus regime-specific CSV and pivot tables.
  • Add results/published manifests for 9ch micro CPU/GPU and 9ch DataLoader CPU/GPU (standard and highmem) with benchmark params, system info, and pointers to per-library result JSONs.
  • Check in the associated raw *_results.json files for albumentationsx, torchvision, and kornia and generate per-regime CSV and *_pivot.csv files for RGB, image9ch, and video16f in docs/paper_data/.
  • Regenerate all_results.csv, summary.json, figure_winner_counts.csv, figure_winner_rows.csv, and unsupported_and_early_stopped.csv to incorporate the new regimes.
results/published/paper-9ch-micro-c4-standard-16-2026-05-10/manifest.json
results/published/paper-9ch-micro-gpu-g2-standard-16-2026-05-10/manifest.json
results/published/paper-9ch-dataloader-memory-c4-standard-16-2026-05-10/manifest.json
results/published/paper-9ch-dataloader-memory-c4-highmem-16-2026-05-11/manifest.json
results/published/paper-9ch-dataloader-gpu-decode-g2-standard-16-2026-05-11/manifest.json
results/published/paper-9ch-*/**/*_results.json
docs/paper_data/all_results.csv
docs/paper_data/summary.json
docs/paper_data/figure_winner_counts.csv
docs/paper_data/figure_winner_rows.csv
docs/paper_data/image9ch_*.csv
docs/paper_data/image9ch_*_pivot.csv
docs/paper_data/rgb_*_cpu.csv
docs/paper_data/rgb_*_gpu.csv
docs/paper_data/video16f_*.csv
docs/paper_data/video16f_*_pivot.csv

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In implementation_label, using regime.endswith("_dataloader_gpu") will incorrectly label micro GPU regimes like *_micro_gpu as CPU; consider checking for "_gpu" in the regime string or maintaining an explicit GPU-regime set so both micro and DataLoader GPU cases are handled correctly.
  • The new run_gcp_first_available_gpu_zone.sh hardcodes a long zone list; if you expect to reuse this script, consider centralizing the default zone list (or reading it from an env/config file) to avoid having to update multiple places when GCP regions/zones change.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `implementation_label`, using `regime.endswith("_dataloader_gpu")` will incorrectly label micro GPU regimes like `*_micro_gpu` as CPU; consider checking for `"_gpu"` in the regime string or maintaining an explicit GPU-regime set so both micro and DataLoader GPU cases are handled correctly.
- The new `run_gcp_first_available_gpu_zone.sh` hardcodes a long zone list; if you expect to reuse this script, consider centralizing the default zone list (or reading it from an env/config file) to avoid having to update multiple places when GCP regions/zones change.

## Individual Comments

### Comment 1
<location path="scripts/paper/common.py" line_range="76-77" />
<code_context>


 def implementation_label(regime: str, library: str) -> str:
-    device = "GPU" if regime == "rgb_dataloader_gpu" else "CPU"
+    device = "GPU" if regime.endswith("_dataloader_gpu") else "CPU"
     return f"{LIBRARY_DISPLAY.get(library, library)} {device}"

</code_context>
<issue_to_address>
**issue (bug_risk):** GPU micro regimes are now mislabeled as CPU in implementation_label.

The new regimes like `rgb_micro_gpu`, `image9ch_micro_gpu`, and `video16f_micro_gpu` don’t match `regime.endswith("_dataloader_gpu")`, so they’ll be labeled as CPU despite running on GPU. If `implementation_label` is surfaced in user-facing tables, GPU micro benchmarks will appear as “Library CPU”. Consider a broader condition (e.g. `regime.endswith("_gpu")`) or an explicit mapping from regime to device.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread scripts/paper/common.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14df97d7be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread benchmark/cloud/gcp.py
Comment on lines +261 to +262
if [[ -d "$WORKDIR/results" ]]; then
gcs_rsync_retry "results" 300 "$WORKDIR/results" "${run_prefix}/results/" || terminal_ok=0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Withhold DONE when result sync fails

When the final results rsync exhausts its retries, this line only flips terminal_ok=0; the function still proceeds to upload and confirm the DONE marker a few lines later. In a successful benchmark where GCS result syncing fails, downstream pollers that treat DONE as terminal success (for example scripts/run_gcp_rgb_micro_cpu_proxies.sh checks $prefix/DONE) can stop and consume an incomplete results/ tree even though the VM is kept for triage. Return before writing the marker, or only create/upload DONE after all prior terminal artifacts succeeded.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR publishes curated 9‑channel benchmark snapshots and updates the paper-data generation pipeline so the repository can render 9‑channel (and additional regime) summaries in the README and docs/paper_data/ outputs. It also tightens detached GCP run reliability by syncing results/ to GCS before writing the terminal DONE/FAILED marker.

Changes:

  • Add published 9‑channel result snapshots (CPU micro, GPU micro, CPU DataLoader, GPU DataLoader) and regenerate paper-data CSV/MD/JSON artifacts from them.
  • Extend paper-data/figure generation scripts to include 9‑channel regimes (and additional regime scaffolding) and patch a scenario summary table block into the README.
  • Update detached GCP bootstrap to rsync results before writing terminal markers, and add tests guarding the ordering.

Reviewed changes

Copilot reviewed 59 out of 65 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_jobs_orchestrator.py Adds a regression test ensuring detached GCP bootstrap syncs results before writing DONE/FAILED.
tests/test_config_models.py Updates expected paper production config specs; adds new video + Kornia highmem config expectations.
scripts/run_gcp_first_available_gpu_zone.sh Adds a helper script to attempt launches across multiple GPU zones and stop on non-capacity failures.
scripts/paper/generate_paper_data.py Adds new regimes to published snapshot selection and emits winner-count sections across all regimes.
scripts/paper/generate_figures_and_insights.py Generates README “scenario benchmark tables” from all_results.csv and expands public data sync to include more regime CSVs.
scripts/paper/common.py Adds regime labels/order entries for 9‑channel and video; tweaks implementation_label device naming logic.
benchmark/cloud/gcp.py Adds gcs_rsync_retry and ensures terminal artifact upload syncs results/ before writing marker files.
README.md Injects generated “Scenario benchmark tables” (RGB + 9‑channel + video placeholder) into the paper figures section.
docs/paper_data/video16f_micro_gpu.csv Adds a (currently empty) generated CSV header for the video GPU micro regime.
docs/paper_data/video16f_micro_gpu_pivot.csv Adds an (empty) generated pivot header for the video GPU micro regime.
docs/paper_data/video16f_micro_cpu.csv Adds a (currently empty) generated CSV header for the video CPU micro regime.
docs/paper_data/video16f_micro_cpu_pivot.csv Adds an (empty) generated pivot header for the video CPU micro regime.
docs/paper_data/video16f_dataloader_gpu.csv Adds a (currently empty) generated CSV header for the video GPU DataLoader regime.
docs/paper_data/video16f_dataloader_gpu_pivot.csv Adds an (empty) generated pivot header for the video GPU DataLoader regime.
docs/paper_data/video16f_dataloader_cpu.csv Adds a (currently empty) generated CSV header for the video CPU DataLoader regime.
docs/paper_data/video16f_dataloader_cpu_pivot.csv Adds an (empty) generated pivot header for the video CPU DataLoader regime.
docs/paper_data/unsupported_and_early_stopped.md Updates the supplement table with 9‑channel unsupported/early-stopped rows.
docs/paper_data/unsupported_and_early_stopped.csv Updates the CSV backing the unsupported/early-stopped supplement with 9‑channel rows.
docs/paper_data/summary.md Regenerates the paper summary markdown to include 9‑channel regimes (and additional empty sections).
docs/paper_data/summary.json Regenerates the summary JSON to include 9‑channel regimes and placeholder video regimes.
docs/paper_data/rgb_micro_gpu.csv Regenerates RGB micro GPU CSV data (now committed as generated output).
docs/paper_data/rgb_micro_gpu_pivot.csv Regenerates RGB micro GPU pivot table.
docs/paper_data/rgb_micro_cpu_pivot.csv Regenerates RGB micro CPU pivot table.
docs/paper_data/image9ch_micro_gpu.csv Adds generated 9‑channel GPU micro CSV from the new published snapshots.
docs/paper_data/image9ch_micro_gpu_pivot.csv Adds generated 9‑channel GPU micro pivot table.
docs/paper_data/image9ch_micro_cpu.csv Adds generated 9‑channel CPU micro CSV from the new published snapshots.
docs/paper_data/image9ch_micro_cpu_pivot.csv Adds generated 9‑channel CPU micro pivot table.
docs/paper_data/image9ch_dataloader_gpu.csv Adds generated 9‑channel GPU DataLoader CSV from the new published snapshots.
docs/paper_data/image9ch_dataloader_gpu_pivot.csv Adds generated 9‑channel GPU DataLoader pivot table.
docs/paper_data/image9ch_dataloader_cpu_pivot.csv Adds generated 9‑channel CPU DataLoader pivot table.
docs/paper_data/figure_winner_rows.csv Regenerates figure backing data to include 9‑channel winner rows.
docs/paper_data/figure_winner_counts.csv Regenerates figure backing data to include 9‑channel winner counts.
docs/benchmark_scope.md Adds a “Remaining Supplement Plan” section detailing 9‑channel/video follow-up runs and next steps.
configs/paper/prod_g2_video_micro_gpu.yaml Adds a production video GPU micro config.
configs/paper/prod_g2_video_dataloader_gpu.yaml Adds a production video GPU DataLoader config.
configs/paper/prod_c4_video_micro_cpu.yaml Adds a production video CPU micro config.
configs/paper/prod_c4_video_dataloader_cpu.yaml Adds a production video CPU DataLoader config.
configs/paper/prod_c4_highmem_9ch_dataloader_cpu_kornia.yaml Adds a dedicated highmem Kornia-only 9‑channel CPU DataLoader config.
configs/paper/prod_c4_9ch_dataloader_cpu.yaml Removes Kornia from the standard 9‑channel CPU DataLoader production config.
results/published/paper-9ch-micro-gpu-g2-standard-16-2026-05-10/manifest.json Adds published manifest metadata for 9‑channel GPU micro snapshot.
results/published/paper-9ch-micro-c4-standard-16-2026-05-10/manifest.json Adds published manifest metadata for 9‑channel CPU micro snapshot.
results/published/paper-9ch-dataloader-memory-c4-standard-16-2026-05-10/torchvision_memory_dataloader_augment_n10000_r1_w8_b128_results.json Adds a published TorchVision 9‑channel CPU DataLoader result JSON snapshot.
results/published/paper-9ch-dataloader-memory-c4-standard-16-2026-05-10/manifest.json Adds published manifest metadata for 9‑channel CPU DataLoader (partial) snapshot.
results/published/paper-9ch-dataloader-memory-c4-highmem-16-2026-05-11/manifest.json Adds published manifest metadata for the Kornia-only highmem rerun slice.
results/published/paper-9ch-dataloader-gpu-decode-g2-standard-16-2026-05-11/manifest.json Adds published manifest metadata for 9‑channel GPU DataLoader snapshot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/paper/common.py
Comment on lines 76 to 78
def implementation_label(regime: str, library: str) -> str:
device = "GPU" if regime == "rgb_dataloader_gpu" else "CPU"
device = "GPU" if regime.endswith("_dataloader_gpu") else "CPU"
return f"{LIBRARY_DISPLAY.get(library, library)} {device}"
Comment on lines 847 to 852
_open_dataloader_leaderboard_table(rows),
"",
"## Winner Counts",
"",
_best_library_table(rows, "rgb_micro_cpu"),
"",
_best_library_table(rows, "rgb_dataloader_cpu"),
"",
_best_library_table(rows, "rgb_micro_gpu"),
"",
_best_library_table(rows, "rgb_dataloader_gpu"),
*(section for regime in REGIME_LABELS for section in (_best_library_table(rows, regime), "")),
"",
Comment on lines +40 to +47
"image9ch_micro_cpu": [("paper-9ch-micro-c4-*", "latest")],
"image9ch_micro_gpu": [("paper-9ch-micro-gpu-g2-*", "latest")],
"image9ch_dataloader_cpu": [("paper-9ch-dataloader-memory-c4-*", "all")],
"image9ch_dataloader_gpu": [("paper-9ch-dataloader-gpu-decode-g2-*", "latest")],
"video16f_micro_cpu": [("paper-video-micro-c4-*", "latest")],
"video16f_micro_gpu": [("paper-video-micro-gpu-g2-*", "latest")],
"video16f_dataloader_cpu": [("paper-video-dataloader-memory-c4-*", "latest")],
"video16f_dataloader_gpu": [("paper-video-dataloader-gpu-g2-*", "latest")],
@ternaus ternaus merged commit 0f33b3d into main May 11, 2026
2 checks passed
@ternaus ternaus deleted the codex/publish-9ch-results-readme branch May 11, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants