Skip to content

Add mask-aware dataset grids for instance segmentation label validation #1013

@pirnerjonas

Description

@pirnerjonas

Search before asking

  • I have searched the RF-DETR issues and found no similar feature requests.

Description

RF-DETR currently has save_dataset_grids for visualizing augmented training and validation samples with bounding boxes. It would be useful to extend this workflow so instance segmentation datasets also render masks in the dataset grids.

This would give users a quick, Ultralytics/YOLO-style label sanity check for segmentation tasks: inspect a small grid of samples and immediately see whether images, boxes, class labels, and instance masks stay aligned after loading, resizing, cropping, padding, and augmentation.

Use case

When training segmentation models, bad or misaligned masks are difficult to catch from metrics alone. A mask-aware dataset grid would help users and maintainers validate:

  • COCO polygon/RLE mask decoding
  • class label and mask ordering
  • mask alignment after Albumentations transforms
  • mask alignment after batch padding
  • obvious dataset export issues before a long training run

The easy initial path is to add optional mask rendering to the existing DatasetGridSaver: if target["masks"] is present, pass it into supervision.Detections and draw it with sv.MaskAnnotator before boxes and labels. I have a small PR path for this scoped version.

Additional

One concern is whether the existing DatasetGridSaver is still the best integration point after the PyTorch Lightning refactor. Today, save_dataset_grids is called before trainer.fit() and manually iterates the dataloaders. That works for the CPU dataset pipeline, but it can bypass Lightning lifecycle behavior such as RFDETRDataModule.on_after_batch_transfer(), where GPU/Kornia augmentation and normalization are applied.

A more complete integration may be a Lightning callback, wired through build_trainer(), that saves a limited number of train/val batches from the actual PTL batch lifecycle. That would make the grids represent what the model really sees, including GPU augmentations and multi-scale batch mutations.

The first PR can still be the small mask-rendering change, but I would like maintainer input on whether the longer-term home should be:

  • keep DatasetGridSaver as a dataloader utility, or
  • refactor it into a batch renderer and call it from a PTL callback.

Are you willing to submit a PR?

  • Yes, I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions