Training pipeline transformation order?

I'm training various model heads (HTC, cascade mask-rcnn, etc.) with the CBNetV2 backbone (implementation [here](https://github.com/VDIGPKU/CBNetV2)) on a custom coco-format dataset with only bboxes. I'm using the following training and testing pipelines:

```python
albu_train_transforms = [
    dict(
        type='ShiftScaleRotate',
        shift_limit=0.0625,
        scale_limit=0.0,
        rotate_limit=0,
        interpolation=1,
        p=0.5),
    dict(
        type='RandomBrightnessContrast',
        brightness_limit=[0.1, 0.3],
        contrast_limit=[0.1, 0.3],
        p=0.2),
    dict(
        type='OneOf',
        transforms=[
            dict(
                type='RGBShift',
                r_shift_limit=10,
                g_shift_limit=10,
                b_shift_limit=10,
                p=1.0),
            dict(
                type='HueSaturationValue',
                hue_shift_limit=20,
                sat_shift_limit=30,
                val_shift_limit=20,
                p=1.0)
        ],
        p=0.1),
    dict(type='JpegCompression', quality_lower=85, quality_upper=95, p=0.2),
    dict(type='ChannelShuffle', p=0.1),
    dict(
        type='OneOf',
        transforms=[
            dict(type='Blur', blur_limit=3, p=1.0),
            dict(type='MedianBlur', blur_limit=3, p=1.0)
        ],
        p=0.1),
]
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
    dict(
        type='Resize',
        img_scale=[(1600, 400), (1600, 1400)],
        multiscale_mode='range',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.),
    dict(type='Pad', size_divisor=32),
    dict(
        type='Albu',
        transforms=albu_train_transforms,
        bbox_params=dict(
            type='BboxParams',
            format='pascal_voc',
            label_fields=['gt_labels'],
            min_visibility=0.0,
            filter_lost_elements=True),
        keymap={
            'img': 'image',
            'gt_bboxes': 'bboxes',
        },
        update_pad_shape=False,
        skip_img_without_anno=True),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle'),
    dict(
        type='Collect',
        keys=['img', 'gt_bboxes', 'gt_labels'])
]
```
```python
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1600, 1400),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize',**img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
```

And variations of it (e.g. sometimes I remove the Albumentations sequence). However, it seems to be very sensitive to the order of operations--for example, when I used the below pipelines, the training loss decayed nicely but the test bbox evaluation metrics were all zero and the generated detections look all random...

```python
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='LoadAnnotations',
        with_bbox=True,
        with_mask=False,
        with_seg=False),
    dict(
        type='Resize',
        img_scale=[(800.0, 200.0), (800.0, 700.0)],
        multiscale_mode='range',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='SegRescale', scale_factor=0.125),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
```
```python
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(800.0, 700.0),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
```
However, once I switched 'RandomFlip' and 'Resize' in the training pipeline (the order of which seems to vary between different configs?) and made the image size its original 1600x400-1400 range (from e.g. [this config](https://github.com/VDIGPKU/CBNetV2/blob/main/configs/cbnet/cascade_mask_rcnn_cbv2_swin_base_patch4_window7_mstrain_400-1400_adamw_3x_coco.py)), it worked.

It seems to be a data pipeline issue. I couldn't find any documentation on the allowed order of data transformations, and from reading the comments on the operations in the [transforms.py](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/pipelines/transforms.py) script, it seems like you can just sequence them pretty much randomly (besides starting with loading and ending with collecting). But that doesn't explain why it keeps having these seemingly arbitrary issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training pipeline transformation order? #6106

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training pipeline transformation order? #6106

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions