Skip to content

Add HRViT #1730

Open
Open
@lorinczszabolcs

Description

@lorinczszabolcs

Describe the feature

Add the model described in "Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation" which is a new vision transformer backbone design for semantic segmentation. It has a multi-branch high-resolution (HR) architecture with enhanced multi-scale representability, surpassing state-of-the-art MiT and CSWin backbones with an average of +1.78 mIoU improvement, 28% parameter saving, and 21% FLOPs reduction on ADE20K and Cityscapes.

Motivation

Recent model that combines the features of HRNet and ViT, achieving good performance while reducing parameters and FLOPs.

Related resources

Official code can be found here.

Additional context
Their implementation already uses mmseg and mmcv, so should be quite straightforward to add support for it.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions