UniNeXt: Exploring A Unified Architecture for Vision Recognition
model
Attention
acc@1
#params
#FLOPs
UniNeXt-T
local window
83.6
24M
4.3G
UniNeXt-S
local window
84.1
51M
9.5G
UniNeXt-B
local window
84.4
91M
17.1G
UniNeXt-T
cross-shaped window
83.5
24M
4.3G
UniNeXt-S
cross-shaped window
84.3
51M
9.6G
UniNeXt-B
cross-shaped window
84.7
91M
17.2G
Main Results on Downstream Tasks
COCO Object Detection
backbone
Attention
Method
pretrain
lr Schd
box mAP
mask mAP
UniNeXt-T
local window
Mask R-CNN
ImageNet-1K
1x
48.6
43.4
UniNeXt-S
local window
Mask R-CNN
ImageNet-1K
1x
49.0
43.7
UniNeXt-B
local window
Mask R-CNN
ImageNet-1K
1x
49.3
43.9
UniNeXt-T
cross-shaped window
Mask R-CNN
ImageNet-1K
1x
48.7
43.6
UniNeXt-S
cross-shaped window
Mask R-CNN
ImageNet-1K
1x
49.2
43.8
UniNeXt-B
cross-shaped window
Mask R-CNN
ImageNet-1K
1x
-
-
ADE20K Semantic Segmentation (val)
Backbone
Attention
Method
pretrain
Crop Size
Lr Schd
mIoU
mIoU (ms+flip)
UniNeXt-T
local window
UPerNet
ImageNet-1K
512x512
160K
49.7
50.6
UniNeXt-S
local window
UperNet
ImageNet-1K
512x512
160K
51.0
51.8
UniNeXt-B
local window
UperNet
ImageNet-1K
512x512
160K
51.4
52.2
UniNeXt-T
cross-shaped window
UPerNet
ImageNet-1K
512x512
160K
49.9
-
UniNeXt-S
cross-shaped window
UperNet
ImageNet-1K
512x512
160K
51.5
-
UniNeXt-B
cross-shaped window
UperNet
ImageNet-1K
512x512
160K
51.6
-