Add tiling artifact diagnostics eval by Hgherzog · Pull Request #537 · allenai/olmoearth_pretrain

Hgherzog · 2026-04-15T18:50:06Z

Summary

Add TILING_DIAGNOSTICS eval mode that detects spatial tiling/striping artifacts via row/col variance anisotropy and 2D FFT axis energy, plus PCA RGB visualization logged to W&B
Add TaskType.DIAGNOSTIC to cleanly separate diagnostic datasets from real segmentation tasks, preserving spatial dims without requiring probe_lr/ft_lr
Add per-resolution pretrain subset configs (pretrain_subset_64, pretrain_subset_128) with backward-compat pretrain_subset alias
Wire tiling diagnostics into in-loop evals (script.py), standalone evals (all_evals.py), and checkpoint sweeps (checkpoint_sweep_evals.py via TILING_DIAGNOSTICS_ONLY)
Add evaluation docs for both embedding diagnostics and tiling diagnostics (metrics, launch commands, interpretation)

Test plan

Unit tests for compute_tiling_artifact_metrics (stripe detection, FFT period, isotropic baseline, edge cases)
Unit tests for pca_rgb_image (shape, dtype, value range)
Existing embedding diagnostics + config tests still pass
Smoke test in-loop eval on a real checkpoint to verify W&B logging

Made with Cursor

Note

Medium Risk
Adds a new evaluation mode and dataset/task-type handling in the core eval pipeline, plus new FFT/PCA computations and W&B image logging, which could affect eval stability/perf and downstream task selection logic.

Overview
Adds a new tiling artifact diagnostics evaluation path (EvalMode.TILING_DIAGNOSTICS) that computes striping/tiling signals from spatial embeddings (row/col variance ratio, FFT axis energy, dominant period) and optionally logs a PCA-RGB visualization to W&B.

Introduces TaskType.DIAGNOSTIC and per-resolution pretrain_subset_64/pretrain_subset_128 configs (with pretrain_subset as an alias) so diagnostic datasets preserve spatial dimensions and can be run without probe/finetune learning-rate requirements; updates the pretrain-subset loader to derive hw_p from config and to return per-pixel dummy labels.

Wires these diagnostics into in-loop training evals (scripts/official/script.py), standalone task registries (internal/all_evals.py), and checkpoint sweep evaluation selection/logging (checkpoint_sweep_evals.py via TILING_DIAGNOSTICS_ONLY), and expands docs/tests accordingly.

^{Reviewed by Cursor Bugbot for commit df61340. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds EvalMode.TILING_DIAGNOSTICS to systematically measure patch boundary artifacts in embeddings (GitHub issue #499). Three metrics are computed: - row_col_var_ratio: directional variance anisotropy (1.0 = healthy) - fft_axis_energy_frac: spectral energy on grid axes (~0.06 healthy, >0.25 artifacts) - fft_dominant_period_px: period of strongest artifact in pixels Also generates PCA RGB visualizations logged to WandB for visual inspection. Includes checkpoint sweep support via TILING_DIAGNOSTICS_ONLY env var for evaluating tiling across training history in a single WandB run. Made-with: Cursor

- Add TaskType.DIAGNOSTIC so pretrain_subset configs don't abuse SEGMENTATION - Update eval_wrapper to preserve spatial dims for DIAGNOSTIC task type - Add backward-compat "pretrain_subset" alias (defaults to 128px) - Remove fragile eval-mode enumeration in probe_lr validation - Fix stray print() → logger.info() in _get_embeddings - Add embedding diagnostics and tiling diagnostics sections to Evaluation.md Made-with: Cursor

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit df61340. Configure here.}

- get_eval_mode: return "embedding_diagnostics" for TaskType.DIAGNOSTIC so diagnostic datasets don't fall back to linear_probe and fail the probe_lr requirement. - evaluator_callback: re-exempt EMBEDDING_DIAGNOSTICS and TILING_DIAGNOSTICS eval modes from the probe_lr/ft_lr requirement on segmentation datasets. Made-with: Cursor

cursor · 2026-04-16T16:54:06Z

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2026-04-16T17:01:19Z

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

root and others added 2 commits April 15, 2026 02:32

github-actions Bot added the size/l label Apr 15, 2026

cursor Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread olmoearth_pretrain/evals/datasets/configs.py

Comment thread olmoearth_pretrain/train/callbacks/evaluator_callback.py Outdated

lint

a1c55b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tiling artifact diagnostics eval#537

Add tiling artifact diagnostics eval#537
Hgherzog wants to merge 4 commits intomainfrom
henryh/tiling-artifact-diagnostics

Hgherzog commented Apr 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot commented Apr 16, 2026

Uh oh!

cursor Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hgherzog commented Apr 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot commented Apr 16, 2026

Uh oh!

cursor Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hgherzog commented Apr 15, 2026 •

edited by cursor Bot

Loading