Add HRViT

# Describe the feature

Add the model described in ["Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation"](https://arxiv.org/abs/2111.01236) which is a new vision transformer backbone design for semantic segmentation. It has a multi-branch high-resolution (HR) architecture with enhanced multi-scale representability, surpassing state-of-the-art MiT and CSWin backbones with an average of +1.78 mIoU improvement, 28% parameter saving, and 21% FLOPs reduction on ADE20K and Cityscapes.

**Motivation**

Recent model that combines the features of HRNet and ViT, achieving good performance while reducing parameters and FLOPs.

**Related resources**

Official code can be found [here](https://github.com/facebookresearch/HRViT).

**Additional context**
Their implementation already uses mmseg and mmcv, so should be quite straightforward to add support for it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HRViT #1730

Describe the feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add HRViT #1730

Description

Describe the feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions