feat(data): add CutPaste-based synthetic anomaly generator with scar and union variants#3463
feat(data): add CutPaste-based synthetic anomaly generator with scar and union variants#3463geeky33 wants to merge 7 commits into
Conversation
…ehavior Signed-off-by: geeky33 <aaryap1204@gmail.com>
…and union variants Signed-off-by: geeky33 <aaryap1204@gmail.com> Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR adds a CutPaste-based synthetic anomaly generator to anomalib’s synthetic data pipeline (alongside the existing Perlin-based approach), and wires the new synthetic configuration options through datamodules, configs, and tests.
Changes:
- Introduces a new
CutPasteGenerator(normal/scar/union modes) and a visualization utility script. - Extends
SyntheticAnomalyDataset/make_synthetic_datasetto support multiple generator backends viagenerator_type, plus configurableblend_factor,probability, andmask_threshold. - Refactors Perlin threshold-rescale logic into a shared helper and adds/updates unit tests and documentation/config snippets.
Reviewed changes
Copilot reviewed 30 out of 34 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/cutpaste_visualization.py | Adds a CLI utility to visualize CutPaste outputs and masks. |
| tests/unit/data/utils/test_synthetic.py | Adds tests for new synthetic dataset knobs and mask threshold behavior. |
| tests/unit/data/utils/test_perlin.py | Adds tests for shared Perlin utilities and GLASS compatibility helpers. |
| tests/unit/data/utils/test_cutpaste.py | Adds tests for the new CutPaste generator behaviors and modes. |
| tests/unit/data/datamodule/image/test_folder.py | Verifies datamodule forwards synthetic config into synthetic split creation. |
| src/anomalib/models/image/supersimplenet/anomaly_generator.py | Switches to shared Perlin threshold-rescale helper. |
| src/anomalib/models/image/dsr/anomaly_generator.py | Switches to shared Perlin threshold-rescale helper. |
| src/anomalib/data/utils/synthetic.py | Adds generator backend selection + config validation and change-mask computation. |
| src/anomalib/data/utils/generators/perlin.py | Adds shared Perlin rescale helper, GLASS defaults, and new configuration knobs. |
| src/anomalib/data/utils/generators/cutpaste.py | Implements the new CutPaste synthetic anomaly generator. |
| src/anomalib/data/utils/generators/init.py | Exposes CutPaste and new Perlin helper/constants via package exports. |
| src/anomalib/data/utils/init.py | Re-exports new generator utilities at anomalib.data.utils level. |
| src/anomalib/data/datamodules/image/visa.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/vad.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/tabular.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/realiad.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/mvtecad2.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/mvtecad.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/mvtec_loco.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/mpdd.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/kolektor.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/kaputt.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/folder.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/datumaro.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/btech.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/image/bmad.py | Forwards synthetic configuration args to the base datamodule. |
| src/anomalib/data/datamodules/base/image.py | Plumbs synthetic config into synthetic val/test split creation. |
| examples/configs/data/folder.yaml | Documents new synthetic configuration keys for the Folder datamodule example. |
| docs/source/snippets/config/data/image/folder/segmentation/cli/normal_and_synthetic.yaml | Updates CLI snippet to include new synthetic keys. |
| docs/source/snippets/config/data/image/folder/classification/cli/normal_and_synthetic.yaml | Updates CLI snippet to include new synthetic keys. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return perlin_noise | ||
|
|
||
| denominator = perlin_noise.max() - perlin_noise.min() | ||
| if denominator == 0: |
There was a problem hiding this comment.
apply_perlin_threshold_rescale uses if denominator == 0: where denominator is a scalar tensor (perlin_noise.max() - perlin_noise.min()). This will raise RuntimeError: Boolean value of Tensor with more than one value is ambiguous (and will break the new unit test that passes uniform noise). Convert to a Python scalar (e.g., denominator.item()) or use a tensor-safe check like torch.isclose(denominator, torch.tensor(0., device=...)) before branching.
| if denominator == 0: | |
| if denominator.item() == 0: |
| generator = CutPasteGenerator( | ||
| mode="normal", | ||
| enable_hflip=False, | ||
| enable_vflip=False, | ||
| enable_color_jitter=False, | ||
| rotation_range=(0.0, 0.0), | ||
| brightness_shift_range=(1.2, 1.2), | ||
| ) | ||
| transformed_patch, _ = generator._transform_patch(patch) | ||
| assert not torch.allclose(transformed_patch, patch) |
There was a problem hiding this comment.
CutPasteGenerator._transform_patch now requires a selected_mode argument, but this test calls it with only the patch (generator._transform_patch(patch)). This will fail with a TypeError. Pass an explicit mode (e.g., 'normal') or update the helper signature to provide a default.
Signed-off-by: geeky33 <aaryap1204@gmail.com>
|
@geeky33 thanks for the efforts. However before integrating the changes directly maybe it is better to start with a design document. I am concerned about changes to so many files. Can we come up with a design that makes the lest amount of changes to the existing files? Additionally, a point of consideration is how would new changes be integrated when we introduce more anomaly generation methods. Let's start with a document detailing stubs, examples, APIs etc |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 34 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/anomalib/data/utils/synthetic.py:119
- The
make_synthetic_datasetdocstring still states that masks are generated by applying Perlin-noise perturbations, but the function now supports multiple backends viagenerator_type(Perlin and CutPaste). This description is misleading; update the narrative paragraph to describe the generic synthetic augmentation flow (and mention Perlin/CutPaste as options).
"""Convert normal samples into a mixed set with synthetic anomalies.
The function generates synthetic anomalous images and their corresponding
masks by applying Perlin noise-based perturbations to normal images.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if scale is None: | ||
| min_scale, max_scale = 0, 6 | ||
| min_scale, max_scale = scale_exponent_range | ||
| scalex = 2 ** torch.randint(min_scale, max_scale, (1,), device=device).item() | ||
| scaley = 2 ** torch.randint(min_scale, max_scale, (1,), device=device).item() |
There was a problem hiding this comment.
generate_perlin_noise now accepts scale_exponent_range, but there is no validation that max_scale > min_scale. If a user passes an invalid range (e.g., equal bounds), torch.randint(min_scale, max_scale, ...) will raise a low-level error. Add an explicit check with a clear ValueError, and consider clarifying in the docstring that the upper bound is exclusive (since torch.randint samples from [min, max)).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: ayraa.ai <141430616+geeky33@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 34 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| device = image.device | ||
| _, height, width = image.shape | ||
| if torch.rand(1, device=device).item() > self.probability: |
There was a problem hiding this comment.
CutPasteGenerator.generate applies augmentation when rand <= probability by returning early only on rand > probability. This makes probability=0.0 very rarely still apply (when rand is exactly 0), which can make the probability=0 unit test flaky. Use an apply-condition like rand < probability (or early-return on rand >= probability) to make boundary cases deterministic for 0.0 and 1.0.
| if torch.rand(1, device=device).item() > self.probability: | |
| if torch.rand(1, device=device).item() >= self.probability: |
| raise ValueError(msg) | ||
| if generator_type == "cutpaste" and not isinstance(blend_factor, float): | ||
| msg = "For generator_type='cutpaste', blend_factor must be a float." | ||
| raise ValueError(msg) |
There was a problem hiding this comment.
_validate_synthetic_config requires blend_factor to be an instance of float for generator_type='cutpaste', which rejects valid numeric values like 0/1 (ints) that CutPasteGenerator can handle. Consider accepting any real scalar (e.g., numbers.Real) and adjusting the error message to clarify that CutPaste expects a single scalar (not a range tuple).
Description
Adds a CutPaste-based synthetic anomaly generator to the anomalib pipeline, extending the existing Perlin-based approach with feature-based anomaly synthesis.
Supports multiple variants (normal, scar, union) for improved diversity and realism of synthetic defects. The implementation is lightweight, CPU-friendly, and fully integrated into the existing dataset/datamodule pipeline with backward compatibility preserved.
🛠️ Fixes #3462
screenshots to support the PR:

✨ Changes
✅ Checklist