open-mmlab
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README_zh-CN.md‎
Lines changed: 1 addition & 0 deletions b/‎README_zh-CN.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎configs/h2rbox_v2/README.md‎
Lines changed: 54 additions & 0 deletions b/‎configs/h2rbox_v2/README.md‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎configs/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dota.py‎
Lines changed: 116 additions & 0 deletions b/‎configs/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dota.py‎
Lines changed: 116 additions & 0 deletions
diff --git a/‎configs/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dotav15.py‎
Lines changed: 116 additions & 0 deletions b/‎configs/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dotav15.py‎
Lines changed: 116 additions & 0 deletions
@@ -172,6 +172,7 @@ A summary can be found in the [Model Zoo](docs/en/model_zoo.md) page.
 - [x] [H2RBox](configs/h2rbox/README.md) (ICLR'2023)
 - [x] [PSC](configs/psc/README.md) (CVPR'2023)
 - [x] [RTMDet](configs/rotated_rtmdet/README.md) (arXiv)
+- [x] [H2RBox-v2](configs/h2rbox_v2/README.md) (arXiv)
 
 </details>
 
 
@@ -168,6 +168,7 @@ https://user-images.githubusercontent.com/10410257/154433305-416d129b-60c8-44c7-
 - [x] [H2RBox](configs/h2rbox/README.md) (ICLR'2023)
 - [x] [PSC](configs/psc/README.md) (CVPR'2023)
 - [x] [RTMDet](configs/rotated_rtmdet/README.md) (arXiv)
+- [x] [H2RBox-v2](configs/h2rbox_v2/README.md) (arXiv)
 
 </details>
 
 
@@ -0,0 +1,54 @@
+# H2RBox-v2
+
+> [H2RBox-v2: Boosting HBox-supervised Oriented Object Detection via Symmetric Learning](https://arxiv.org/pdf/2304.04403)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+<div align=center>
+<img src="https://raw.githubusercontent.com/zytx121/image-host/main/imgs/h2rbox_v2.png" width="800"/>
+</div>
+
+With the increasing demand for oriented object detection e.g. in autonomous driving and remote sensing, the oriented annotation has become a labor-intensive work. To make full use of existing horizontally annotated datasets and reduce the annotation cost, a weakly-supervised detector H2RBox for learning the rotated box (RBox) from the horizontal box (HBox) has been proposed and received great attention. This paper presents a new version, H2RBox-v2, to further bridge the gap between HBox-supervised and RBox-supervised oriented object detection. While exploiting axisymmetry via flipping and rotating consistencies is available through our theoretical analysis, H2RBox-v2, using a weakly-supervised branch similar to H2RBox, is embedded with a novel self-supervised branch that learns orientations from the symmetry inherent in the image of objects. Complemented by modules to cope with peripheral issues, e.g. angular periodicity, a stable and effective solution is achieved. To our knowledge, H2RBox-v2 is the first symmetry-supervised paradigm for oriented object detection. Compared to H2RBox, our method is less susceptible to low annotation quality and insufficient training data, which in such cases is expected to give a competitive performance much closer to fully-supervised oriented object detectors. Specifically, the performance comparison between H2RBox-v2 and Rotated FCOS on DOTA-v1.0/1.5/2.0 is 72.31%/64.76%/50.33% vs. 72.44%/64.53%/51.77%, 89.66% vs. 88.99% on HRSC, and 42.27% vs. 41.25% on FAIR1M.
+
+## Results and models
+
+DOTA1.0
+
+|         Backbone         | AP50  | lr schd | Mem (GB) | Inf Time (fps) |  Aug  | Batch Size |                                      Configs                                      |                                                                                                                                                        Download                                                                                                                                                        |
+| :----------------------: | :---: | :-----: | :------: | :------------: | :---: | :--------: | :-------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| ResNet50 (1024,1024,200) | 72.59 |   1x    |  10.10   |      29.1      |   -   |     2      |       [h2rbox_v2-le90_r50_fpn-1x_dota](./h2rbox_v2-le90_r50_fpn-1x_dota.py)       |            [model](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dota/h2rbox_v2-le90_r50_fpn-1x_dota-fa5ad1d2.pth)   \| [log](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dota/h2rbox_v2-le90_r50_fpn-1x_dota-20230313_103051.json)            |
+| ResNet50 (1024,1024,200) | 78.25 |   1x    |  10.33   |      29.1      | MS+RR |     2      | [h2rbox_v2-le90_r50_fpn_ms_rr-1x_dota](./h2rbox_v2-le90_r50_fpn_ms_rr-1x_dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn_ms_rr-1x_dota/h2rbox_v2-le90_r50_fpn_ms_rr-1x_dota-5e0e53e1.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn_ms_rr-1x_dota/h2rbox_v2-le90_r50_fpn_ms_rr-1x_dota-20230324_011934.json) |
+
+DOTA1.5
+
+|         Backbone         | AP50  | lr schd | Mem (GB) | Inf Time (fps) | Aug | Batch Size |                                   Configs                                   |                                                                                                                                                  Download                                                                                                                                                  |
+| :----------------------: | :---: | :-----: | :------: | :------------: | :-: | :--------: | :-------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| ResNet50 (1024,1024,200) | 64.76 |   1x    |  10.95   |      29.1      |  -  |     2      | [h2rbox_v2-le90_r50_fpn-1x_dotav15](./h2rbox_v2-le90_r50_fpn-1x_dotav15.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dotav15/h2rbox_v2-le90_r50_fpn-1x_dotav15-3adc0309.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dotav15/h2rbox_v2-le90_r50_fpn-1x_dotav15-20230316_192940.json) |
+
+DOTA2.0
+
+|         Backbone         | AP50  | lr schd | Mem (GB) | Inf Time (fps) | Aug | Batch Size |                                  Configs                                  |                                                                                                                                                Download                                                                                                                                                |
+| :----------------------: | :---: | :-----: | :------: | :------------: | :-: | :--------: | :-----------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| ResNet50 (1024,1024,200) | 50.33 |   1x    |  11.02   |      29.1      |  -  |     2      | [h2rbox_v2-le90_r50_fpn-1x_dotav2](./h2rbox_v2-le90_r50_fpn-1x_dotav2.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dotav2/h2rbox_v2-le90_r50_fpn-1x_dotav2-b1ec4d3c.pth) \| [log](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn-1x_dotav2/h2rbox_v2-le90_r50_fpn-1x_dotav2-20230316_200353.json) |
+
+HRSC
+
+|         Backbone         | AP50  | lr schd | Mem (GB) | Inf Time (fps) | Aug | Batch Size |                                   Configs                                   |                                                                                                                                                  Download                                                                                                                                                   |
+| :----------------------: | :---: | :-----: | :------: | :------------: | :-: | :--------: | :-------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| ResNet50 (1024,1024,200) | 89.66 |   1x    |   5.50   |      45.9      |  -  |     2      |    [h2rbox_v2-le90_r50_fpn-6x_hrsc](./h2rbox_v2-le90_r50_fpn-6x_hrsc.py)    |       [model](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn-6x_hrsc/h2rbox_v2-le90_r50_fpn-6x_hrsc-b3b5e06b.pth)  \| [log](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn-6x_hrsc/h2rbox_v2-le90_r50_fpn-6x_hrsc-20230312_073744.json)       |
+| ResNet50 (1024,1024,200) | 89.56 |   1x    |   5.50   |      45.9      | RR  |     2      | [h2rbox_v2-le90_r50_fpn_rr-6x_hrsc](./h2rbox_v2-le90_r50_fpn_rr-6x_hrsc.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn_rr-6x_hrsc/h2rbox_v2-le90_r50_fpn_rr-6x_hrsc-ee6e851a.pth)  \| [log](https://download.openmmlab.com/mmrotate/v1.0/h2rbox_v2/h2rbox_v2-le90_r50_fpn_rr-6x_hrsc/h2rbox_v2-le90_r50_fpn_rr-6x_hrsc-20230312_073800.json) |
+
+## Citation
+
+```
+@misc{yu2023h2rboxv2,
+title={H2RBox-v2: Boosting HBox-supervised Oriented Object Detection via Symmetric Learning},
+author={Yi Yu and Xue Yang and Qingyun Li and Yue Zhou and Gefan Zhang and Feipeng Da and Junchi Yan},
+year={2023},
+eprint={2304.04403},
+archivePrefix={arXiv},
+primaryClass={cs.CV}
+}
+```
@@ -0,0 +1,116 @@
+_base_ = [
+    '../_base_/datasets/dota.py', '../_base_/schedules/schedule_1x.py',
+    '../_base_/default_runtime.py'
+]
+angle_version = 'le90'
+
+# model settings
+model = dict(
+    type='H2RBoxV2Detector',
+    crop_size=(1024, 1024),
+    view_range=(0.25, 0.75),
+    data_preprocessor=dict(
+        type='mmdet.DetDataPreprocessor',
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        bgr_to_rgb=True,
+        pad_size_divisor=32,
+        boxtype2tensor=False),
+    backbone=dict(
+        type='mmdet.ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3),
+        frozen_stages=1,
+        norm_cfg=dict(type='BN', requires_grad=True),
+        norm_eval=True,
+        style='pytorch',
+        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
+    neck=dict(
+        type='mmdet.FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        start_level=1,
+        add_extra_convs='on_output',
+        num_outs=5,
+        relu_before_extra_convs=True),
+    bbox_head=dict(
+        type='H2RBoxV2Head',
+        num_classes=15,
+        in_channels=256,
+        angle_version='le90',
+        stacked_convs=4,
+        feat_channels=256,
+        strides=[8, 16, 32, 64, 128],
+        center_sampling=True,
+        center_sample_radius=1.5,
+        norm_on_bbox=True,
+        centerness_on_reg=True,
+        use_hbbox_loss=False,
+        scale_angle=False,
+        rotation_agnostic_classes=[1, 9, 11],
+        agnostic_resize_classes=[1],
+        use_circumiou_loss=True,
+        use_standalone_angle=True,
+        use_reweighted_loss_bbox=False,
+        angle_coder=dict(
+            type='PSCCoder',
+            angle_version=angle_version,
+            dual_freq=False,
+            num_step=3,
+            thr_mod=0),
+        bbox_coder=dict(
+            type='DistanceAnglePointCoder', angle_version=angle_version),
+        loss_cls=dict(
+            type='mmdet.FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(type='mmdet.IoULoss', loss_weight=1.0),
+        loss_centerness=dict(
+            type='mmdet.CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
+        loss_symmetry_ss=dict(
+            type='H2RBoxV2ConsistencyLoss',
+            use_snap_loss=True,
+            loss_rot=dict(
+                type='mmdet.SmoothL1Loss', loss_weight=1.0, beta=0.1),
+            loss_flp=dict(
+                type='mmdet.SmoothL1Loss', loss_weight=0.05, beta=0.1))),
+    # training and testing settings
+    train_cfg=None,
+    test_cfg=dict(
+        nms_pre=2000,
+        min_bbox_size=0,
+        score_thr=0.05,
+        nms=dict(type='nms_rotated', iou_threshold=0.1),
+        max_per_img=2000))
+
+# load hbox annotations
+train_pipeline = [
+    dict(type='mmdet.LoadImageFromFile', backend_args={{_base_.backend_args}}),
+    dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+    # Horizontal GTBox, (x1,y1,x2,y2)
+    dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='hbox')),
+    # Horizontal GTBox, (x,y,w,h,theta)
+    dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+    dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
+    dict(
+        type='mmdet.RandomFlip',
+        prob=0.75,
+        direction=['horizontal', 'vertical', 'diagonal']),
+    dict(type='mmdet.PackDetInputs')
+]
+
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+
+# optimizer
+optim_wrapper = dict(
+    optimizer=dict(
+        _delete_=True,
+        type='AdamW',
+        lr=0.00005,
+        betas=(0.9, 0.999),
+        weight_decay=0.05))
+
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_interval=6)
@@ -0,0 +1,116 @@
+_base_ = [
+    '../_base_/datasets/dotav15.py', '../_base_/schedules/schedule_1x.py',
+    '../_base_/default_runtime.py'
+]
+angle_version = 'le90'
+
+# model settings
+model = dict(
+    type='H2RBoxV2Detector',
+    crop_size=(1024, 1024),
+    view_range=(0.25, 0.75),
+    data_preprocessor=dict(
+        type='mmdet.DetDataPreprocessor',
+        mean=[123.675, 116.28, 103.53],
+        std=[58.395, 57.12, 57.375],
+        bgr_to_rgb=True,
+        pad_size_divisor=32,
+        boxtype2tensor=False),
+    backbone=dict(
+        type='mmdet.ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3),
+        frozen_stages=1,
+        norm_cfg=dict(type='BN', requires_grad=True),
+        norm_eval=True,
+        style='pytorch',
+        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
+    neck=dict(
+        type='mmdet.FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        start_level=1,
+        add_extra_convs='on_output',
+        num_outs=5,
+        relu_before_extra_convs=True),
+    bbox_head=dict(
+        type='H2RBoxV2Head',
+        num_classes=16,
+        in_channels=256,
+        angle_version='le90',
+        stacked_convs=4,
+        feat_channels=256,
+        strides=[8, 16, 32, 64, 128],
+        center_sampling=True,
+        center_sample_radius=1.5,
+        norm_on_bbox=True,
+        centerness_on_reg=True,
+        use_hbbox_loss=False,
+        scale_angle=False,
+        rotation_agnostic_classes=[1, 9, 11],
+        agnostic_resize_classes=[1],
+        use_circumiou_loss=True,
+        use_standalone_angle=True,
+        use_reweighted_loss_bbox=False,
+        angle_coder=dict(
+            type='PSCCoder',
+            angle_version=angle_version,
+            dual_freq=False,
+            num_step=3,
+            thr_mod=0),
+        bbox_coder=dict(
+            type='DistanceAnglePointCoder', angle_version=angle_version),
+        loss_cls=dict(
+            type='mmdet.FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(type='mmdet.IoULoss', loss_weight=1.0),
+        loss_centerness=dict(
+            type='mmdet.CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
+        loss_symmetry_ss=dict(
+            type='H2RBoxV2ConsistencyLoss',
+            use_snap_loss=True,
+            loss_rot=dict(
+                type='mmdet.SmoothL1Loss', loss_weight=1.0, beta=0.1),
+            loss_flp=dict(
+                type='mmdet.SmoothL1Loss', loss_weight=0.05, beta=0.1))),
+    # training and testing settings
+    train_cfg=None,
+    test_cfg=dict(
+        nms_pre=2000,
+        min_bbox_size=0,
+        score_thr=0.05,
+        nms=dict(type='nms_rotated', iou_threshold=0.1),
+        max_per_img=2000))
+
+# load hbox annotations
+train_pipeline = [
+    dict(type='mmdet.LoadImageFromFile', backend_args={{_base_.backend_args}}),
+    dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+    # Horizontal GTBox, (x1,y1,x2,y2)
+    dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='hbox')),
+    # Horizontal GTBox, (x,y,w,h,theta)
+    dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+    dict(type='mmdet.Resize', scale=(1024, 1024), keep_ratio=True),
+    dict(
+        type='mmdet.RandomFlip',
+        prob=0.75,
+        direction=['horizontal', 'vertical', 'diagonal']),
+    dict(type='mmdet.PackDetInputs')
+]
+
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+
+# optimizer
+optim_wrapper = dict(
+    optimizer=dict(
+        _delete_=True,
+        type='AdamW',
+        lr=0.00005,
+        betas=(0.9, 0.999),
+        weight_decay=0.05))
+
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_interval=6)