|
1 |
| -InternImage |
2 |
| -======== |
3 |
| - |
| 1 | +# InternImage |
4 | 2 |
|
5 | 3 | [](https://paperswithcode.com/sota/object-detection-on-coco?p=internimage-exploring-large-scale-vision)
|
6 | 4 | [](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=internimage-exploring-large-scale-vision)
|
7 | 5 | [](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=internimage-exploring-large-scale-vision)
|
| 6 | +[](https://paperswithcode.com/sota/object-detection-on-lvis-v1-0-minival?p=towards-all-in-one-pre-training-via) |
| 7 | +[](https://paperswithcode.com/sota/3d-object-detection-on-nuscenes-camera-only?p=bevformer-v2-adapting-modern-image-backbones) |
8 | 8 | [](https://paperswithcode.com/sota/image-classification-on-imagenet?p=internimage-exploring-large-scale-vision)
|
9 | 9 |
|
10 | 10 | This repository is an official implementation of the [InternImage: Exploring Large-Scale Vision Foundation Models with
|
11 | 11 | Deformable Convolutions](https://arxiv.org/abs/2211.05778).
|
12 | 12 |
|
13 | 13 | By Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao
|
14 | 14 |
|
15 |
| -Code will be available. |
| 15 | +## News |
| 16 | + |
| 17 | +- `Nov 18, 2022`: 🚀 InternImage-XL merged into [BEVFormer v2](https://arxiv.org/abs/2211.10439) achieves stae-of-the-art performance of `63.4 NDS` on nuScenes Camera Only. |
| 18 | +- `Nov 10, 2022`: 🚀🚀 InternImage-H achieves a new record `65.4 mAP` on COCO detection test-dev and `62.9 mIoU` on |
| 19 | +ADE20K, outperforming previous models by a large margin. |
| 20 | + |
| 21 | +## Coming soon |
| 22 | +- [ ] Classification/detection/segmentation code of the InternImage series. |
| 23 | +- [ ] InternImage-T/S/B/L/XL ImageNet-1k pretrained model. |
| 24 | +- [ ] InternImage-L/XL ImageNet-22k pretrained model. |
| 25 | +- [ ] InternImage-T/S/B/L/XL detection and instance segmentation model. |
| 26 | +- [ ] InternImage-T/S/B/L/XL semantic segmentation model. |
| 27 | + |
| 28 | +## Introduction |
| 29 | + |
| 30 | +**InternImage**, initially described in [arxiv](https://arxiv.org/abs/2211.05778), can be a general backbone for computer vision. |
| 31 | +It takes deformable convolution as the core operator to obtain large effective receptive fields, and introducing adaptive spatial aggregation |
| 32 | +to reduces the strict inductive bias. Our model makes it possible to learn more stronger and robust models with large-scale parameters from massive data. |
| 33 | + |
| 34 | +<div align=center> |
| 35 | +<img src='./figs/arch.png' width=400> |
| 36 | +</div> |
| 37 | + |
| 38 | +## Main Results on ImageNet with Pretrained Models |
| 39 | + |
| 40 | +**ImageNet-1K and ImageNet-22K Pretrained InternImage Models** |
| 41 | + |
| 42 | +| name | pretrain | resolution |acc@1 | #params | FLOPs | |
| 43 | +| :---: | :---: | :---: | :---: | :---: | :---: | |
| 44 | +| InternImage-T | ImageNet-1K | 224x224 | 83.5 | 30M | 5G | |
| 45 | +| InternImage-S | ImageNet-1K | 224x224 | 84.2 | 50M | 8G | |
| 46 | +| InternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | 16G | |
| 47 | +| InternImage-L | ImageNet-22K | 384x384 | 87.7 | 223M | 108G | |
| 48 | +| InternImage-XL | ImageNet-22K | 384x384 | 88.0 | 335M | 163G | |
| 49 | + |
| 50 | +## Main Results on Downstream Tasks |
| 51 | + |
| 52 | +**COCO Object Detection** |
| 53 | + |
| 54 | +| backbone | method | lr schedule | box mAP | mask mAP | #params | FLOPs | |
| 55 | +| :---: | :---: | :---: | :---: | :---: | :---: | :---: | |
| 56 | +| InternImage-T | Mask R-CNN | 1x | 47.2 | 42.5 | 49M | 270G | |
| 57 | +| InternImage-S | Mask R-CNN | 1x | 47.8 | 43.3 | 69M | 340G | |
| 58 | +| InternImage-B | Mask R-CNN | 1x | 48.8 | 44.0 | 115M | 501G | |
| 59 | +| InternImage-L | Cascade Mask R-CNN | 1x | 54.9 | 47.7 | 277M | 1399G | |
| 60 | +| InternImage-XL | Cascade Mask R-CNN | 1x | 55.3 | 48.0 | 387M | 1782G | |
| 61 | + |
| 62 | +**ADE20K Semantic Segmentation** |
16 | 63 |
|
| 64 | +| backbone | resolution | single scale | multi scale | #params | FLOPs| |
| 65 | +| :---: | :---: | :---: | :---: | :---: | :---: | |
| 66 | +| InternImage-T | 512x512 | 47.9 | 48.1 | 59M | 944G | |
| 67 | +| InternImage-S | 512x512 | 50.1 | 50.9 | 80M | 1017G | |
| 68 | +| InternImage-B | 512x512 | 50.8 | 51.3 | 128M | 1185G | |
| 69 | +| InternImage-L | 640x640 | 53.9 | 54.1 | 256M | 2526G | |
| 70 | +| InternImage-XL | 640x640 | 55.0 | 55.3 | 368M | 3142G | |
17 | 71 |
|
18 | 72 | ## Citation
|
19 | 73 |
|
|
0 commit comments