Skip to content

Add tiling artifact diagnostics eval#537

Open
Hgherzog wants to merge 4 commits intomainfrom
henryh/tiling-artifact-diagnostics
Open

Add tiling artifact diagnostics eval#537
Hgherzog wants to merge 4 commits intomainfrom
henryh/tiling-artifact-diagnostics

Conversation

@Hgherzog
Copy link
Copy Markdown
Collaborator

@Hgherzog Hgherzog commented Apr 15, 2026

Summary

  • Add TILING_DIAGNOSTICS eval mode that detects spatial tiling/striping artifacts via row/col variance anisotropy and 2D FFT axis energy, plus PCA RGB visualization logged to W&B
  • Add TaskType.DIAGNOSTIC to cleanly separate diagnostic datasets from real segmentation tasks, preserving spatial dims without requiring probe_lr/ft_lr
  • Add per-resolution pretrain subset configs (pretrain_subset_64, pretrain_subset_128) with backward-compat pretrain_subset alias
  • Wire tiling diagnostics into in-loop evals (script.py), standalone evals (all_evals.py), and checkpoint sweeps (checkpoint_sweep_evals.py via TILING_DIAGNOSTICS_ONLY)
  • Add evaluation docs for both embedding diagnostics and tiling diagnostics (metrics, launch commands, interpretation)

Test plan

  • Unit tests for compute_tiling_artifact_metrics (stripe detection, FFT period, isotropic baseline, edge cases)
  • Unit tests for pca_rgb_image (shape, dtype, value range)
  • Existing embedding diagnostics + config tests still pass
  • Smoke test in-loop eval on a real checkpoint to verify W&B logging

Made with Cursor


Note

Medium Risk
Adds a new evaluation mode and dataset/task-type handling in the core eval pipeline, plus new FFT/PCA computations and W&B image logging, which could affect eval stability/perf and downstream task selection logic.

Overview
Adds a new tiling artifact diagnostics evaluation path (EvalMode.TILING_DIAGNOSTICS) that computes striping/tiling signals from spatial embeddings (row/col variance ratio, FFT axis energy, dominant period) and optionally logs a PCA-RGB visualization to W&B.

Introduces TaskType.DIAGNOSTIC and per-resolution pretrain_subset_64/pretrain_subset_128 configs (with pretrain_subset as an alias) so diagnostic datasets preserve spatial dimensions and can be run without probe/finetune learning-rate requirements; updates the pretrain-subset loader to derive hw_p from config and to return per-pixel dummy labels.

Wires these diagnostics into in-loop training evals (scripts/official/script.py), standalone task registries (internal/all_evals.py), and checkpoint sweep evaluation selection/logging (checkpoint_sweep_evals.py via TILING_DIAGNOSTICS_ONLY), and expands docs/tests accordingly.

Reviewed by Cursor Bugbot for commit df61340. Bugbot is set up for automated code reviews on this repo. Configure here.

root and others added 2 commits April 15, 2026 02:32
Adds EvalMode.TILING_DIAGNOSTICS to systematically measure patch boundary
artifacts in embeddings (GitHub issue #499). Three metrics are computed:
- row_col_var_ratio: directional variance anisotropy (1.0 = healthy)
- fft_axis_energy_frac: spectral energy on grid axes (~0.06 healthy, >0.25 artifacts)
- fft_dominant_period_px: period of strongest artifact in pixels

Also generates PCA RGB visualizations logged to WandB for visual inspection.

Includes checkpoint sweep support via TILING_DIAGNOSTICS_ONLY env var
for evaluating tiling across training history in a single WandB run.

Made-with: Cursor
- Add TaskType.DIAGNOSTIC so pretrain_subset configs don't abuse SEGMENTATION
- Update eval_wrapper to preserve spatial dims for DIAGNOSTIC task type
- Add backward-compat "pretrain_subset" alias (defaults to 128px)
- Remove fragile eval-mode enumeration in probe_lr validation
- Fix stray print() → logger.info() in _get_embeddings
- Add embedding diagnostics and tiling diagnostics sections to Evaluation.md

Made-with: Cursor
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit df61340. Configure here.

Comment thread olmoearth_pretrain/evals/datasets/configs.py
Comment thread olmoearth_pretrain/train/callbacks/evaluator_callback.py Outdated
- get_eval_mode: return "embedding_diagnostics" for TaskType.DIAGNOSTIC
  so diagnostic datasets don't fall back to linear_probe and fail the
  probe_lr requirement.
- evaluator_callback: re-exempt EMBEDDING_DIAGNOSTICS and TILING_DIAGNOSTICS
  eval modes from the probe_lr/ft_lr requirement on segmentation datasets.

Made-with: Cursor
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 16, 2026

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 16, 2026

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant