[RFC] Batteries Included - Phase 3

### 🚀 The feature

*Note: To track the progress of the project check out [this board](https://github.com/pytorch/vision/projects/6).*

This is the 3rd phase of TorchVision's modernization project (see phase [1](https://github.com/pytorch/vision/issues/3911) and [2](https://github.com/pytorch/vision/issues/5410)). We aim to keep TorchVision relevant by ensuring it provides off-the-shelf all the necessary primitives, model architectures and recipe utilities to produce SOTA results for the supported  Computer Vision tasks. 

<h2 id="new-primitives">1. New Primitives</h2>

To enable our users to reproduce the latest state-of-the-art research we will enhance TorchVision with the following data augmentations, layers, losses and other operators:

### Data Augmentations

- [ ] [AutoAugment for Detection](https://arxiv.org/abs/1906.11172) [[1](https://github.com/Jasonlee1995/AutoAugment_Detection), [2](https://github.com/tensorflow/tpu/blob/master/models/official/detection/utils/autoaugment_utils.py)] - #6224 #6609
- [ ] [Mosaic](https://arxiv.org/pdf/2004.10934.pdf) [[1](https://colab.research.google.com/drive/1YWb7a_3bHqG30SoIxU5S4lHKktkRseyY?usp=sharing), [2](https://github.com/ultralytics/yolov5/blob/2e10909905b1e0e7eb7bac086600fe7ee2c0e6a5/utils/dataloaders.py#L691)] - #6534
- [ ] [Mixup for Detection](https://arxiv.org/pdf/1902.04103v1.pdf) [[1](https://github.com/Megvii-BaseDetection/YOLOX/blob/a5f629a6d28fcc3742ce9483698b3376ce457533/yolox/data/datasets/mosaicdetection.py#L162-L234), [2](https://github.com/open-mmlab/mmdetection/blob/1376e77e6ecbaad609f6003725158de24ed42e84/mmdet/datasets/pipelines/transforms.py#L2347)] - #6720 #6721

### Losses
- [ ] [Dice Loss](https://campar.in.tum.de/pub/milletari2016Vnet/milletari2016Vnet.pdf) [[1](https://github.com/pytorch/pytorch/issues/1249#issuecomment-305088398), [2](https://github.com/rogertrullo/pytorch/blob/rogertrullo-dice_loss/torch/nn/functional.py#L708)] - #6435 #6960
- [ ] [Poly Loss](https://arxiv.org/abs/2204.12511) [[1](https://github.com/yiyixuxu/polyloss-pytorch/blob/master/PolyLoss.py), [2](https://github.com/abhuse/polyloss-pytorch/blob/main/polyloss.py)] - #6439 #6457

### Operators added in PyTorch Core

- [ ] [LARS Optimizer](https://arxiv.org/abs/1708.03888) [[1](https://github.com/4uiiurz1/pytorch-lars/blob/master/lars.py), [2](https://lightning-flash.readthedocs.io/en/0.5.0/_modules/flash/core/optimizers/lars.html#LARS)] - https://github.com/pytorch/pytorch/pull/88106
- [ ]  [LAMB Optimizer](https://arxiv.org/abs/1904.00962) [[1](https://pytorch-optimizer.readthedocs.io/en/latest/_modules/torch_optimizer/lamb.html), [2](https://lightning-flash.readthedocs.io/en/0.5.0/_modules/flash/core/optimizers/lamb.html)] - #6868
- [x] [Polynomial LR Scheduler](https://github.com/pytorch/pytorch/issues/79511) [[1](https://github.com/pytorch/vision/issues/4438), [2](https://github.com/cmpark0126/pytorch-polynomial-lr-decay/blob/master/torch_poly_lr_decay/torch_poly_lr_decay.py)] - [code](https://github.com/pytorch/vision/issues/4438#issuecomment-1202722898) - https://github.com/pytorch/pytorch/pull/82769

<h2 id="new-models">2. New Architectures & Model Iterations</h2>

To ensure that our users have access to the most popular SOTA models, we will add the following architectures along with pre-trained weights:

### Image Classification

- [x] [Swin Transformer V2](https://arxiv.org/abs/2111.09883) - #6242 #6246
- [ ] MobileViT [v1](https://arxiv.org/abs/2110.02178) & [v2](https://arxiv.org/abs/2206.02680) [[1](https://github.com/chinhsuanwu/mobilevit-pytorch), [2](https://github.com/apple/ml-cvnets/blob/main/cvnets/models/classification/mobilevit_v2.py)] - #6404
- [x] [MaxViT](https://arxiv.org/abs/2204.01697) - #6342

### Video Classification

- [x] [MViTv2](https://arxiv.org/abs/2112.01526) [[1](https://github.com/facebookresearch/SlowFast/commit/1aebd71a2efad823d52b827a3deaf15a56cf4932)] - #6373
- [x] Swin3d [[1](https://github.com/facebookresearch/multimodal/blob/main/torchmultimodal/modules/encoders/swin_transformer_3d_encoder.py)] - #6499 #6521
- [x] [S3D](https://arxiv.org/abs/1712.04851) [[1](https://github.com/kylemin/S3D/blob/master/model.py)] - #6402 #6412 #6537

<h2 id="new-weights">3. Improved Training Recipes & Pre-trained models</h2>

To ensure that are users can have access to strong baselines and SOTA weights, we will improve our training recipes to incorporate the newly released primitives and offer improved pre-trained models:

### Reference Scripts
- [ ] Update the Reference Scripts to use the latest primitives - #6405 #6433

### Pre-trained weights
- [ ] Improve the accuracy of Video models

----

## Other Candidates

There are several other Operators (#5414), Losses (#2980), Augmentations (#3817) and Models (#2707) proposed by the community. Here are some potential candidates that we could implement depending on bandwidth. Contributions are welcome for any of the below:

- [YOLOX](https://arxiv.org/abs/2107.08430) [[1](https://github.com/Megvii-BaseDetection/YOLOX)] - #6341
- [DeTR](https://arxiv.org/abs/2005.12872) - #5922 #6922
- [U-Net](https://arxiv.org/abs/1505.04597) - #6610 #6611
- MViTv2 for Images [[1](https://github.com/facebookresearch/mvit)]
- [Video Transformer Network](https://arxiv.org/abs/2102.00719) [[1](https://github.com/bomri/SlowFast/blob/master/slowfast/models/video_model_builder.py#L765)]
- [MTV](https://arxiv.org/abs/2201.04288)
- [Deformable DeTR](https://arxiv.org/abs/2010.04159)
- [Shortcut Regularizer](https://github.com/pytorch/vision/pull/4549) (FX-based)
- [Hide-and-Seek](https://arxiv.org/abs/1704.04232) - #6796

cc @datumbox @vfdev-5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Batteries Included - Phase 3 #6323

🚀 The feature

1. New Primitives

Data Augmentations

Losses

Operators added in PyTorch Core

2. New Architectures & Model Iterations

Image Classification

Video Classification

3. Improved Training Recipes & Pre-trained models

Reference Scripts

Pre-trained weights

Other Candidates

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Batteries Included - Phase 3 #6323

Description

🚀 The feature

1. New Primitives

Data Augmentations

Losses

Operators added in PyTorch Core

2. New Architectures & Model Iterations

Image Classification

Video Classification

3. Improved Training Recipes & Pre-trained models

Reference Scripts

Pre-trained weights

Other Candidates

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions