Skip to content

feat(data): add CutPaste-based synthetic anomaly generator with scar and union variants#3463

Open
geeky33 wants to merge 7 commits into
open-edge-platform:mainfrom
geeky33:cutpaste
Open

feat(data): add CutPaste-based synthetic anomaly generator with scar and union variants#3463
geeky33 wants to merge 7 commits into
open-edge-platform:mainfrom
geeky33:cutpaste

Conversation

@geeky33
Copy link
Copy Markdown

@geeky33 geeky33 commented Mar 24, 2026

Description

Adds a CutPaste-based synthetic anomaly generator to the anomalib pipeline, extending the existing Perlin-based approach with feature-based anomaly synthesis.

Supports multiple variants (normal, scar, union) for improved diversity and realism of synthetic defects. The implementation is lightweight, CPU-friendly, and fully integrated into the existing dataset/datamodule pipeline with backward compatibility preserved.

🛠️ Fixes #3462

screenshots to support the PR:
image

✨ Changes

  • 🚀 New feature (non-breaking change which adds functionality)
  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • 🔄 Refactor (non-breaking change which refactors the code base)
  • ⚡ Performance improvements
  • 🎨 Style changes (code style/formatting)
  • 🧪 Tests (adding/modifying tests)
  • 📚 Documentation update
  • 📦 Build system changes
  • 🚧 CI/CD configuration
  • 🔧 Chore (general maintenance)
  • 🔒 Security update
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)

✅ Checklist

  • 📚 I have made the necessary updates to the documentation (if applicable).
  • 🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).
  • 🏷️ My PR title follows conventional commit format.

Copilot AI review requested due to automatic review settings March 24, 2026 15:09
…and union variants

Signed-off-by: geeky33 <aaryap1204@gmail.com>
Made-with: Cursor
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CutPaste-based synthetic anomaly generator to anomalib’s synthetic data pipeline (alongside the existing Perlin-based approach), and wires the new synthetic configuration options through datamodules, configs, and tests.

Changes:

  • Introduces a new CutPasteGenerator (normal/scar/union modes) and a visualization utility script.
  • Extends SyntheticAnomalyDataset/make_synthetic_dataset to support multiple generator backends via generator_type, plus configurable blend_factor, probability, and mask_threshold.
  • Refactors Perlin threshold-rescale logic into a shared helper and adds/updates unit tests and documentation/config snippets.

Reviewed changes

Copilot reviewed 30 out of 34 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/cutpaste_visualization.py Adds a CLI utility to visualize CutPaste outputs and masks.
tests/unit/data/utils/test_synthetic.py Adds tests for new synthetic dataset knobs and mask threshold behavior.
tests/unit/data/utils/test_perlin.py Adds tests for shared Perlin utilities and GLASS compatibility helpers.
tests/unit/data/utils/test_cutpaste.py Adds tests for the new CutPaste generator behaviors and modes.
tests/unit/data/datamodule/image/test_folder.py Verifies datamodule forwards synthetic config into synthetic split creation.
src/anomalib/models/image/supersimplenet/anomaly_generator.py Switches to shared Perlin threshold-rescale helper.
src/anomalib/models/image/dsr/anomaly_generator.py Switches to shared Perlin threshold-rescale helper.
src/anomalib/data/utils/synthetic.py Adds generator backend selection + config validation and change-mask computation.
src/anomalib/data/utils/generators/perlin.py Adds shared Perlin rescale helper, GLASS defaults, and new configuration knobs.
src/anomalib/data/utils/generators/cutpaste.py Implements the new CutPaste synthetic anomaly generator.
src/anomalib/data/utils/generators/init.py Exposes CutPaste and new Perlin helper/constants via package exports.
src/anomalib/data/utils/init.py Re-exports new generator utilities at anomalib.data.utils level.
src/anomalib/data/datamodules/image/visa.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/vad.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/tabular.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/realiad.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/mvtecad2.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/mvtecad.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/mvtec_loco.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/mpdd.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/kolektor.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/kaputt.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/folder.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/datumaro.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/btech.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/image/bmad.py Forwards synthetic configuration args to the base datamodule.
src/anomalib/data/datamodules/base/image.py Plumbs synthetic config into synthetic val/test split creation.
examples/configs/data/folder.yaml Documents new synthetic configuration keys for the Folder datamodule example.
docs/source/snippets/config/data/image/folder/segmentation/cli/normal_and_synthetic.yaml Updates CLI snippet to include new synthetic keys.
docs/source/snippets/config/data/image/folder/classification/cli/normal_and_synthetic.yaml Updates CLI snippet to include new synthetic keys.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return perlin_noise

denominator = perlin_noise.max() - perlin_noise.min()
if denominator == 0:
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_perlin_threshold_rescale uses if denominator == 0: where denominator is a scalar tensor (perlin_noise.max() - perlin_noise.min()). This will raise RuntimeError: Boolean value of Tensor with more than one value is ambiguous (and will break the new unit test that passes uniform noise). Convert to a Python scalar (e.g., denominator.item()) or use a tensor-safe check like torch.isclose(denominator, torch.tensor(0., device=...)) before branching.

Suggested change
if denominator == 0:
if denominator.item() == 0:

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +95
generator = CutPasteGenerator(
mode="normal",
enable_hflip=False,
enable_vflip=False,
enable_color_jitter=False,
rotation_range=(0.0, 0.0),
brightness_shift_range=(1.2, 1.2),
)
transformed_patch, _ = generator._transform_patch(patch)
assert not torch.allclose(transformed_patch, patch)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CutPasteGenerator._transform_patch now requires a selected_mode argument, but this test calls it with only the patch (generator._transform_patch(patch)). This will fail with a TypeError. Pass an explicit mode (e.g., 'normal') or update the helper signature to provide a default.

Copilot uses AI. Check for mistakes.
Signed-off-by: geeky33 <aaryap1204@gmail.com>
@ashwinvaidya17
Copy link
Copy Markdown
Contributor

@geeky33 thanks for the efforts. However before integrating the changes directly maybe it is better to start with a design document. I am concerned about changes to so many files. Can we come up with a design that makes the lest amount of changes to the existing files? Additionally, a point of consideration is how would new changes be integrated when we introduce more anomaly generation methods. Let's start with a document detailing stubs, examples, APIs etc

Copilot AI review requested due to automatic review settings March 26, 2026 12:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 34 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/anomalib/data/utils/synthetic.py:119

  • The make_synthetic_dataset docstring still states that masks are generated by applying Perlin-noise perturbations, but the function now supports multiple backends via generator_type (Perlin and CutPaste). This description is misleading; update the narrative paragraph to describe the generic synthetic augmentation flow (and mention Perlin/CutPaste as options).
    """Convert normal samples into a mixed set with synthetic anomalies.

    The function generates synthetic anomalous images and their corresponding
    masks by applying Perlin noise-based perturbations to normal images.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/anomalib/data/utils/synthetic.py Outdated
Comment on lines 120 to 123
if scale is None:
min_scale, max_scale = 0, 6
min_scale, max_scale = scale_exponent_range
scalex = 2 ** torch.randint(min_scale, max_scale, (1,), device=device).item()
scaley = 2 ** torch.randint(min_scale, max_scale, (1,), device=device).item()
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_perlin_noise now accepts scale_exponent_range, but there is no validation that max_scale > min_scale. If a user passes an invalid range (e.g., equal bounds), torch.randint(min_scale, max_scale, ...) will raise a low-level error. Add an explicit check with a clear ValueError, and consider clarifying in the docstring that the upper bound is exclusive (since torch.randint samples from [min, max)).

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: ayraa.ai <141430616+geeky33@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 26, 2026 12:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 34 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


device = image.device
_, height, width = image.shape
if torch.rand(1, device=device).item() > self.probability:
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CutPasteGenerator.generate applies augmentation when rand <= probability by returning early only on rand > probability. This makes probability=0.0 very rarely still apply (when rand is exactly 0), which can make the probability=0 unit test flaky. Use an apply-condition like rand < probability (or early-return on rand >= probability) to make boundary cases deterministic for 0.0 and 1.0.

Suggested change
if torch.rand(1, device=device).item() > self.probability:
if torch.rand(1, device=device).item() >= self.probability:

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +102
raise ValueError(msg)
if generator_type == "cutpaste" and not isinstance(blend_factor, float):
msg = "For generator_type='cutpaste', blend_factor must be a float."
raise ValueError(msg)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_validate_synthetic_config requires blend_factor to be an instance of float for generator_type='cutpaste', which rejects valid numeric values like 0/1 (ints) that CutPasteGenerator can handle. Consider accepting any real scalar (e.g., numbers.Real) and adjusting the error message to clarify that CutPaste expects a single scalar (not a range tuple).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

✨ Add CutPaste-based synthetic anomaly generation to anomalib pipeline

3 participants