docs(paper): publish 9-channel benchmark results#65
Conversation
Add curated 9-channel result snapshots, include 9-channel regimes in generated paper data, and render a 9-channel summary table in the README.
Add remaining production video and high-memory 9-channel cloud configs, document the supplement run plan, and make detached GCP runs sync results before writing terminal markers.
Replace the 9-channel coverage summary with generated transform-by-regime tables for RGB and 9-channel data, and add video regime wiring for future published snapshots.
Reviewer's GuideAdds curated 9-channel (image-9ch) benchmark snapshots to the paper dataset, wires them into the figure/table generation pipeline (including new README scenario tables), extends regime handling to 9-channel and video scenarios, and improves GCP job orchestration and result syncing for paper runs. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
implementation_label, usingregime.endswith("_dataloader_gpu")will incorrectly label micro GPU regimes like*_micro_gpuas CPU; consider checking for"_gpu"in the regime string or maintaining an explicit GPU-regime set so both micro and DataLoader GPU cases are handled correctly. - The new
run_gcp_first_available_gpu_zone.shhardcodes a long zone list; if you expect to reuse this script, consider centralizing the default zone list (or reading it from an env/config file) to avoid having to update multiple places when GCP regions/zones change.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `implementation_label`, using `regime.endswith("_dataloader_gpu")` will incorrectly label micro GPU regimes like `*_micro_gpu` as CPU; consider checking for `"_gpu"` in the regime string or maintaining an explicit GPU-regime set so both micro and DataLoader GPU cases are handled correctly.
- The new `run_gcp_first_available_gpu_zone.sh` hardcodes a long zone list; if you expect to reuse this script, consider centralizing the default zone list (or reading it from an env/config file) to avoid having to update multiple places when GCP regions/zones change.
## Individual Comments
### Comment 1
<location path="scripts/paper/common.py" line_range="76-77" />
<code_context>
def implementation_label(regime: str, library: str) -> str:
- device = "GPU" if regime == "rgb_dataloader_gpu" else "CPU"
+ device = "GPU" if regime.endswith("_dataloader_gpu") else "CPU"
return f"{LIBRARY_DISPLAY.get(library, library)} {device}"
</code_context>
<issue_to_address>
**issue (bug_risk):** GPU micro regimes are now mislabeled as CPU in implementation_label.
The new regimes like `rgb_micro_gpu`, `image9ch_micro_gpu`, and `video16f_micro_gpu` don’t match `regime.endswith("_dataloader_gpu")`, so they’ll be labeled as CPU despite running on GPU. If `implementation_label` is surfaced in user-facing tables, GPU micro benchmarks will appear as “Library CPU”. Consider a broader condition (e.g. `regime.endswith("_gpu")`) or an explicit mapping from regime to device.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 14df97d7be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if [[ -d "$WORKDIR/results" ]]; then | ||
| gcs_rsync_retry "results" 300 "$WORKDIR/results" "${run_prefix}/results/" || terminal_ok=0 |
There was a problem hiding this comment.
Withhold DONE when result sync fails
When the final results rsync exhausts its retries, this line only flips terminal_ok=0; the function still proceeds to upload and confirm the DONE marker a few lines later. In a successful benchmark where GCS result syncing fails, downstream pollers that treat DONE as terminal success (for example scripts/run_gcp_rgb_micro_cpu_proxies.sh checks $prefix/DONE) can stop and consume an incomplete results/ tree even though the VM is kept for triage. Return before writing the marker, or only create/upload DONE after all prior terminal artifacts succeeded.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR publishes curated 9‑channel benchmark snapshots and updates the paper-data generation pipeline so the repository can render 9‑channel (and additional regime) summaries in the README and docs/paper_data/ outputs. It also tightens detached GCP run reliability by syncing results/ to GCS before writing the terminal DONE/FAILED marker.
Changes:
- Add published 9‑channel result snapshots (CPU micro, GPU micro, CPU DataLoader, GPU DataLoader) and regenerate paper-data CSV/MD/JSON artifacts from them.
- Extend paper-data/figure generation scripts to include 9‑channel regimes (and additional regime scaffolding) and patch a scenario summary table block into the README.
- Update detached GCP bootstrap to rsync results before writing terminal markers, and add tests guarding the ordering.
Reviewed changes
Copilot reviewed 59 out of 65 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_jobs_orchestrator.py | Adds a regression test ensuring detached GCP bootstrap syncs results before writing DONE/FAILED. |
| tests/test_config_models.py | Updates expected paper production config specs; adds new video + Kornia highmem config expectations. |
| scripts/run_gcp_first_available_gpu_zone.sh | Adds a helper script to attempt launches across multiple GPU zones and stop on non-capacity failures. |
| scripts/paper/generate_paper_data.py | Adds new regimes to published snapshot selection and emits winner-count sections across all regimes. |
| scripts/paper/generate_figures_and_insights.py | Generates README “scenario benchmark tables” from all_results.csv and expands public data sync to include more regime CSVs. |
| scripts/paper/common.py | Adds regime labels/order entries for 9‑channel and video; tweaks implementation_label device naming logic. |
| benchmark/cloud/gcp.py | Adds gcs_rsync_retry and ensures terminal artifact upload syncs results/ before writing marker files. |
| README.md | Injects generated “Scenario benchmark tables” (RGB + 9‑channel + video placeholder) into the paper figures section. |
| docs/paper_data/video16f_micro_gpu.csv | Adds a (currently empty) generated CSV header for the video GPU micro regime. |
| docs/paper_data/video16f_micro_gpu_pivot.csv | Adds an (empty) generated pivot header for the video GPU micro regime. |
| docs/paper_data/video16f_micro_cpu.csv | Adds a (currently empty) generated CSV header for the video CPU micro regime. |
| docs/paper_data/video16f_micro_cpu_pivot.csv | Adds an (empty) generated pivot header for the video CPU micro regime. |
| docs/paper_data/video16f_dataloader_gpu.csv | Adds a (currently empty) generated CSV header for the video GPU DataLoader regime. |
| docs/paper_data/video16f_dataloader_gpu_pivot.csv | Adds an (empty) generated pivot header for the video GPU DataLoader regime. |
| docs/paper_data/video16f_dataloader_cpu.csv | Adds a (currently empty) generated CSV header for the video CPU DataLoader regime. |
| docs/paper_data/video16f_dataloader_cpu_pivot.csv | Adds an (empty) generated pivot header for the video CPU DataLoader regime. |
| docs/paper_data/unsupported_and_early_stopped.md | Updates the supplement table with 9‑channel unsupported/early-stopped rows. |
| docs/paper_data/unsupported_and_early_stopped.csv | Updates the CSV backing the unsupported/early-stopped supplement with 9‑channel rows. |
| docs/paper_data/summary.md | Regenerates the paper summary markdown to include 9‑channel regimes (and additional empty sections). |
| docs/paper_data/summary.json | Regenerates the summary JSON to include 9‑channel regimes and placeholder video regimes. |
| docs/paper_data/rgb_micro_gpu.csv | Regenerates RGB micro GPU CSV data (now committed as generated output). |
| docs/paper_data/rgb_micro_gpu_pivot.csv | Regenerates RGB micro GPU pivot table. |
| docs/paper_data/rgb_micro_cpu_pivot.csv | Regenerates RGB micro CPU pivot table. |
| docs/paper_data/image9ch_micro_gpu.csv | Adds generated 9‑channel GPU micro CSV from the new published snapshots. |
| docs/paper_data/image9ch_micro_gpu_pivot.csv | Adds generated 9‑channel GPU micro pivot table. |
| docs/paper_data/image9ch_micro_cpu.csv | Adds generated 9‑channel CPU micro CSV from the new published snapshots. |
| docs/paper_data/image9ch_micro_cpu_pivot.csv | Adds generated 9‑channel CPU micro pivot table. |
| docs/paper_data/image9ch_dataloader_gpu.csv | Adds generated 9‑channel GPU DataLoader CSV from the new published snapshots. |
| docs/paper_data/image9ch_dataloader_gpu_pivot.csv | Adds generated 9‑channel GPU DataLoader pivot table. |
| docs/paper_data/image9ch_dataloader_cpu_pivot.csv | Adds generated 9‑channel CPU DataLoader pivot table. |
| docs/paper_data/figure_winner_rows.csv | Regenerates figure backing data to include 9‑channel winner rows. |
| docs/paper_data/figure_winner_counts.csv | Regenerates figure backing data to include 9‑channel winner counts. |
| docs/benchmark_scope.md | Adds a “Remaining Supplement Plan” section detailing 9‑channel/video follow-up runs and next steps. |
| configs/paper/prod_g2_video_micro_gpu.yaml | Adds a production video GPU micro config. |
| configs/paper/prod_g2_video_dataloader_gpu.yaml | Adds a production video GPU DataLoader config. |
| configs/paper/prod_c4_video_micro_cpu.yaml | Adds a production video CPU micro config. |
| configs/paper/prod_c4_video_dataloader_cpu.yaml | Adds a production video CPU DataLoader config. |
| configs/paper/prod_c4_highmem_9ch_dataloader_cpu_kornia.yaml | Adds a dedicated highmem Kornia-only 9‑channel CPU DataLoader config. |
| configs/paper/prod_c4_9ch_dataloader_cpu.yaml | Removes Kornia from the standard 9‑channel CPU DataLoader production config. |
| results/published/paper-9ch-micro-gpu-g2-standard-16-2026-05-10/manifest.json | Adds published manifest metadata for 9‑channel GPU micro snapshot. |
| results/published/paper-9ch-micro-c4-standard-16-2026-05-10/manifest.json | Adds published manifest metadata for 9‑channel CPU micro snapshot. |
| results/published/paper-9ch-dataloader-memory-c4-standard-16-2026-05-10/torchvision_memory_dataloader_augment_n10000_r1_w8_b128_results.json | Adds a published TorchVision 9‑channel CPU DataLoader result JSON snapshot. |
| results/published/paper-9ch-dataloader-memory-c4-standard-16-2026-05-10/manifest.json | Adds published manifest metadata for 9‑channel CPU DataLoader (partial) snapshot. |
| results/published/paper-9ch-dataloader-memory-c4-highmem-16-2026-05-11/manifest.json | Adds published manifest metadata for the Kornia-only highmem rerun slice. |
| results/published/paper-9ch-dataloader-gpu-decode-g2-standard-16-2026-05-11/manifest.json | Adds published manifest metadata for 9‑channel GPU DataLoader snapshot. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def implementation_label(regime: str, library: str) -> str: | ||
| device = "GPU" if regime == "rgb_dataloader_gpu" else "CPU" | ||
| device = "GPU" if regime.endswith("_dataloader_gpu") else "CPU" | ||
| return f"{LIBRARY_DISPLAY.get(library, library)} {device}" |
| _open_dataloader_leaderboard_table(rows), | ||
| "", | ||
| "## Winner Counts", | ||
| "", | ||
| _best_library_table(rows, "rgb_micro_cpu"), | ||
| "", | ||
| _best_library_table(rows, "rgb_dataloader_cpu"), | ||
| "", | ||
| _best_library_table(rows, "rgb_micro_gpu"), | ||
| "", | ||
| _best_library_table(rows, "rgb_dataloader_gpu"), | ||
| *(section for regime in REGIME_LABELS for section in (_best_library_table(rows, regime), "")), | ||
| "", |
| "image9ch_micro_cpu": [("paper-9ch-micro-c4-*", "latest")], | ||
| "image9ch_micro_gpu": [("paper-9ch-micro-gpu-g2-*", "latest")], | ||
| "image9ch_dataloader_cpu": [("paper-9ch-dataloader-memory-c4-*", "all")], | ||
| "image9ch_dataloader_gpu": [("paper-9ch-dataloader-gpu-decode-g2-*", "latest")], | ||
| "video16f_micro_cpu": [("paper-video-micro-c4-*", "latest")], | ||
| "video16f_micro_gpu": [("paper-video-micro-gpu-g2-*", "latest")], | ||
| "video16f_dataloader_cpu": [("paper-video-dataloader-memory-c4-*", "latest")], | ||
| "video16f_dataloader_gpu": [("paper-video-dataloader-gpu-g2-*", "latest")], |
Add curated 9-channel result snapshots, include 9-channel regimes in generated paper data, and render a 9-channel summary table in the README.
Summary by Sourcery
Publish 9-channel (image-9ch) benchmark snapshots and integrate them into the paper pipeline, README tables, and summary insights, while preparing configs and scaffolding for upcoming video benchmarks and improving GCP result syncing.
New Features:
Enhancements:
Tests: