Open
Conversation
Adds EvalMode.TILING_DIAGNOSTICS to systematically measure patch boundary artifacts in embeddings (GitHub issue #499). Three metrics are computed: - row_col_var_ratio: directional variance anisotropy (1.0 = healthy) - fft_axis_energy_frac: spectral energy on grid axes (~0.06 healthy, >0.25 artifacts) - fft_dominant_period_px: period of strongest artifact in pixels Also generates PCA RGB visualizations logged to WandB for visual inspection. Includes checkpoint sweep support via TILING_DIAGNOSTICS_ONLY env var for evaluating tiling across training history in a single WandB run. Made-with: Cursor
- Add TaskType.DIAGNOSTIC so pretrain_subset configs don't abuse SEGMENTATION - Update eval_wrapper to preserve spatial dims for DIAGNOSTIC task type - Add backward-compat "pretrain_subset" alias (defaults to 128px) - Remove fragile eval-mode enumeration in probe_lr validation - Fix stray print() → logger.info() in _get_embeddings - Add embedding diagnostics and tiling diagnostics sections to Evaluation.md Made-with: Cursor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit df61340. Configure here.
- get_eval_mode: return "embedding_diagnostics" for TaskType.DIAGNOSTIC so diagnostic datasets don't fall back to linear_probe and fail the probe_lr requirement. - evaluator_callback: re-exempt EMBEDDING_DIAGNOSTICS and TILING_DIAGNOSTICS eval modes from the probe_lr/ft_lr requirement on segmentation datasets. Made-with: Cursor
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
TILING_DIAGNOSTICSeval mode that detects spatial tiling/striping artifacts via row/col variance anisotropy and 2D FFT axis energy, plus PCA RGB visualization logged to W&BTaskType.DIAGNOSTICto cleanly separate diagnostic datasets from real segmentation tasks, preserving spatial dims without requiringprobe_lr/ft_lrpretrain_subset_64,pretrain_subset_128) with backward-compatpretrain_subsetaliasscript.py), standalone evals (all_evals.py), and checkpoint sweeps (checkpoint_sweep_evals.pyviaTILING_DIAGNOSTICS_ONLY)Test plan
compute_tiling_artifact_metrics(stripe detection, FFT period, isotropic baseline, edge cases)pca_rgb_image(shape, dtype, value range)Made with Cursor
Note
Medium Risk
Adds a new evaluation mode and dataset/task-type handling in the core eval pipeline, plus new FFT/PCA computations and W&B image logging, which could affect eval stability/perf and downstream task selection logic.
Overview
Adds a new tiling artifact diagnostics evaluation path (
EvalMode.TILING_DIAGNOSTICS) that computes striping/tiling signals from spatial embeddings (row/col variance ratio, FFT axis energy, dominant period) and optionally logs a PCA-RGB visualization to W&B.Introduces
TaskType.DIAGNOSTICand per-resolutionpretrain_subset_64/pretrain_subset_128configs (withpretrain_subsetas an alias) so diagnostic datasets preserve spatial dimensions and can be run without probe/finetune learning-rate requirements; updates the pretrain-subset loader to derivehw_pfrom config and to return per-pixel dummy labels.Wires these diagnostics into in-loop training evals (
scripts/official/script.py), standalone task registries (internal/all_evals.py), and checkpoint sweep evaluation selection/logging (checkpoint_sweep_evals.pyviaTILING_DIAGNOSTICS_ONLY), and expands docs/tests accordingly.Reviewed by Cursor Bugbot for commit df61340. Bugbot is set up for automated code reviews on this repo. Configure here.