Issue: RuntimeError with SwinTransformer frozen_stages >= 0 in Distributed Training

When using the SwinTransformer backbone in MMDetection and setting [frozen_stages >= 0](vscode-file://vscode-app/l:/Software/VSCode/resources/app/out/vs/code/electron-sandbox/workbench/workbench.html), I encounter the following error during distributed training:
`RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss.`
If I set [frozen_stages=-1](vscode-file://vscode-app/l:/Software/VSCode/resources/app/out/vs/code/electron-sandbox/workbench/workbench.html) (no frozen stages), the training proceeds without any issues. However, freezing any stage (e.g., [frozen_stages=1](vscode-file://vscode-app/l:/Software/VSCode/resources/app/out/vs/code/electron-sandbox/workbench/workbench.html)) consistently triggers this error.

**Reproduction Steps**
Use the following configuration file:
`model = dict(
    backbone=dict(
        type='SwinTransformer',
        embed_dims=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        out_indices=(1, 2, 3),
        frozen_stages=1,  # Freezing the first stage
        init_cfg=dict(type='Pretrained', checkpoint=pretrained)
    ),
    neck=dict(in_channels=[192, 384, 768], start_level=0, num_outs=5)
)`
Run the training command:
`CUDA_VISIBLE_DEVICES=2,3 bash ./tools/dist_train.sh ./configs/swin/retinanet_swin-t-p4-w7_fpn_1x_coco.py 2`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue: RuntimeError with SwinTransformer frozen_stages >= 0 in Distributed Training #12345

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue: RuntimeError with SwinTransformer frozen_stages >= 0 in Distributed Training #12345

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions