Skip to content

Commit 9ab8454

Browse files
ZhenhangHuangczczup
authored andcommitted
Update README.md
1 parent 615e3e9 commit 9ab8454

File tree

3 files changed

+58
-4
lines changed

3 files changed

+58
-4
lines changed

README.md

Lines changed: 58 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,73 @@
1-
InternImage
2-
========
3-
1+
# InternImage
42

53
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/internimage-exploring-large-scale-vision/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=internimage-exploring-large-scale-vision)
64
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/internimage-exploring-large-scale-vision/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=internimage-exploring-large-scale-vision)
75
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/internimage-exploring-large-scale-vision/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=internimage-exploring-large-scale-vision)
6+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-all-in-one-pre-training-via/object-detection-on-lvis-v1-0-minival)](https://paperswithcode.com/sota/object-detection-on-lvis-v1-0-minival?p=towards-all-in-one-pre-training-via)
7+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bevformer-v2-adapting-modern-image-backbones/3d-object-detection-on-nuscenes-camera-only)](https://paperswithcode.com/sota/3d-object-detection-on-nuscenes-camera-only?p=bevformer-v2-adapting-modern-image-backbones)
88
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/internimage-exploring-large-scale-vision/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=internimage-exploring-large-scale-vision)
99

1010
This repository is an official implementation of the [InternImage: Exploring Large-Scale Vision Foundation Models with
1111
Deformable Convolutions](https://arxiv.org/abs/2211.05778).
1212

1313
By Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao
1414

15-
Code will be available.
15+
## News
16+
17+
- `Nov 18, 2022`: 🚀 InternImage-XL merged into [BEVFormer v2](https://arxiv.org/abs/2211.10439) achieves stae-of-the-art performance of `63.4 NDS` on nuScenes Camera Only.
18+
- `Nov 10, 2022`: 🚀🚀 InternImage-H achieves a new record `65.4 mAP` on COCO detection test-dev and `62.9 mIoU` on
19+
ADE20K, outperforming previous models by a large margin.
20+
21+
## Coming soon
22+
- [ ] Classification/detection/segmentation code of the InternImage series.
23+
- [ ] InternImage-T/S/B/L/XL ImageNet-1k pretrained model.
24+
- [ ] InternImage-L/XL ImageNet-22k pretrained model.
25+
- [ ] InternImage-T/S/B/L/XL detection and instance segmentation model.
26+
- [ ] InternImage-T/S/B/L/XL semantic segmentation model.
27+
28+
## Introduction
29+
30+
**InternImage**, initially described in [arxiv](https://arxiv.org/abs/2211.05778), can be a general backbone for computer vision.
31+
It takes deformable convolution as the core operator to obtain large effective receptive fields, and introducing adaptive spatial aggregation
32+
to reduces the strict inductive bias. Our model makes it possible to learn more stronger and robust models with large-scale parameters from massive data.
33+
34+
<div align=center>
35+
<img src='./figs/arch.png' width=400>
36+
</div>
37+
38+
## Main Results on ImageNet with Pretrained Models
39+
40+
**ImageNet-1K and ImageNet-22K Pretrained InternImage Models**
41+
42+
| name | pretrain | resolution |acc@1 | #params | FLOPs |
43+
| :---: | :---: | :---: | :---: | :---: | :---: |
44+
| InternImage-T | ImageNet-1K | 224x224 | 83.5 | 30M | 5G |
45+
| InternImage-S | ImageNet-1K | 224x224 | 84.2 | 50M | 8G |
46+
| InternImage-B | ImageNet-1K | 224x224 | 84.9 | 97M | 16G |
47+
| InternImage-L | ImageNet-22K | 384x384 | 87.7 | 223M | 108G |
48+
| InternImage-XL | ImageNet-22K | 384x384 | 88.0 | 335M | 163G |
49+
50+
## Main Results on Downstream Tasks
51+
52+
**COCO Object Detection**
53+
54+
| backbone | method | lr schedule | box mAP | mask mAP | #params | FLOPs |
55+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
56+
| InternImage-T | Mask R-CNN | 1x | 47.2 | 42.5 | 49M | 270G |
57+
| InternImage-S | Mask R-CNN | 1x | 47.8 | 43.3 | 69M | 340G |
58+
| InternImage-B | Mask R-CNN | 1x | 48.8 | 44.0 | 115M | 501G |
59+
| InternImage-L | Cascade Mask R-CNN | 1x | 54.9 | 47.7 | 277M | 1399G |
60+
| InternImage-XL | Cascade Mask R-CNN | 1x | 55.3 | 48.0 | 387M | 1782G |
61+
62+
**ADE20K Semantic Segmentation**
1663

64+
| backbone | resolution | single scale | multi scale | #params | FLOPs|
65+
| :---: | :---: | :---: | :---: | :---: | :---: |
66+
| InternImage-T | 512x512 | 47.9 | 48.1 | 59M | 944G |
67+
| InternImage-S | 512x512 | 50.1 | 50.9 | 80M | 1017G |
68+
| InternImage-B | 512x512 | 50.8 | 51.3 | 128M | 1185G |
69+
| InternImage-L | 640x640 | 53.9 | 54.1 | 256M | 2526G |
70+
| InternImage-XL | 640x640 | 55.0 | 55.3 | 368M | 3142G |
1771

1872
## Citation
1973

figs/arch.png

652 KB
Loading

figs/logo.png

112 KB
Loading

0 commit comments

Comments
 (0)