Releases: pytorch/vision
iOS support, GPU image decoding, SSDlite and more
This release improves support for mobile, with new mobile-friendly detection models based on SSD and SSDlite, CPU kernels for quantized NMS and quantized RoIAlign, pre-compiled binaries for iOS available in cocoapods and an iOS demo app. It also improves image IO by providing JPEG decoding on the GPU, and many more.
Highlights
[BETA] New models for detection
SSD and SSDlite are two popular object detection architectures which are efficient in terms of speed and provide good results for low resolution pictures. In this release, we provide implementations for the original SSD model with VGG16 backbone and for its mobile-friendly variant SSDlite with MobileNetV3-Large backbone. The models were pre-trained on COCO train2017 and can be used as follows:
import torch
import torchvision
# Original SSD variant
x = [torch.rand(3, 300, 300), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssd300_vgg16(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
# Mobile-friendly SSDlite variant
x = [torch.rand(3, 320, 320), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=True)
m_detector.eval()
predictions = m_detector(x)The following accuracies can be obtained on COCO val2017 (full results available in #3403 and #3757):
| Model | mAP | mAP@50 | mAP@75 |
|---|---|---|---|
| SSD300 VGG16 | 25.1 | 41.5 | 26.2 |
| SSDlite320 MobileNetV3-Large | 21.3 | 34.3 | 22.1 |
[STABLE] Quantized kernels for object detection
The forward pass of the nms and roi_align operators now support tensors with a quantized dtype, which can help lowering the memory footprint of object detection models, particularly on mobile environments.
[BETA] JPEG decoding on the GPU
Decoding jpegs is now possible on GPUs with the use of nvjpeg, which should be readily available in your CUDA setup. The decoding time of a single image should be about 2 to 3 times faster than with libjpeg on CPU. While the resulting tensor will be stored on the GPU device, the input raw tensor still needs to reside on the host (CPU), because the first stages of the decoding process take place on the host:
from torchvision.io.image import read_file, decode_jpeg
data = read_file('path_to_image.jpg') # raw data is on CPU
img = decode_jpeg(data, device='cuda') # decoded image in on GPU[BETA] iOS support
TorchVision 0.10 now provides pre-compiled iOS binaries for its C++ operators, which means you can run Faster R-CNN and Mask R-CNN on iOS. An example app on how to build a program leveraging those ops can be found in here.
[STABLE] Speed optimizations for Tensor transforms
The resize and flip transforms have been optimized and its runtime improved by up to 5x on the CPU. The corresponding PRs were sent to PyTorch in pytorch/pytorch#51653, pytorch/pytorch#54500 and pytorch/pytorch#56713
[STABLE] Documentation improvements
Significant improvements were made to the documentation. In particular, a new gallery of examples is available: see here for the latest version (the stable version is not released at the time of writing). These examples visually illustrate how each transform acts on an image, and also properly documents and illustrate the output of the segmentation models.
The example gallery will be extended in the future to provide more comprehensive examples and serve as a reference for common torchvision tasks.
Backwards Incompatible Changes
- [transforms] Ensure input type of
normalizeis float. (#3621) - [models] Use PyTorch
smooth_l1_lossand remove private custom implementation (#3539)
New Features
- Added iOS binaries and test app (#3582)(#3629) (#3806)
- [datasets] Added KITTI dataset (#3640)
- [utils] Added utility to draw segmentation masks (#3330, #3824)
- [models] Added the SSD & SSDlite object detection models (#3403, #3757, #3766, #3855, #3896, #3818, #3799)
- [transforms] Added
antialiasoption totransforms.functional.resize(#3761, #3810, #3842) - [transforms] Add new
max_sizeparameter toResize(#3494) - [io] Support for decoding jpegs on GPU with
nvjpeg(#3792) - [ci, rocm] Add ROCm to builds (#3840) (#3604) (#3575)
- [ops, models.quantization] Add quantized version of NMS (#3601)
- [ops, models.quantization] Add quantized version of RoIAlign (#3624, #3904)
Improvement
- [build] Various build improvements: (#3618) (#3622) (#3399) (#3794) (#3561)
- [ci] Various CI improvements (#3647) (#3609) (#3635) (#3599) (#3778) (#3636) (#3809) (#3625) (#3764) (#3679) (#3869) (#3871) (#3444) (#3445) (#3480) (#3768) (#3919) (#3641)(#3900)
- [datasets] Improve error handling in
make_dataset(#3496) - [datasets] Remove caching from MNIST and variants (#3420)
- [datasets] Make
DatasetFolder.find_classespublic (#3628) - [datasets] Separate extraction and decompression logic in
datasets.utils.extract_archive(#3443) - [datasets, tests] Improve dataset test coverage and infrastructure (#3450) (#3457) (#3454) (#3447) (#3489) (#3661) (#3458 (#3705) (#3411) (#3461) (#3465) (#3543) (#3550) (#3665) (#3464) (#3595) (#3466) (#3468) (#3467) (#3486) (#3736) (#3730) (#3731) (#3477) (#3589) (#3503) (#3423) (#3492)(#3578) (#3605) (#3448) (#3864) (#3544)
- [datasets, tests] Fix lazy importing for dataset tests (#3481)
- [datasets, tests] Fix
test_extract(zip|tar|tar_xz|gzip)on windows (#3542) - [datasets, tests] Fix
kwargsforwarding in fake data utility functions (#3459) - [datasets, tests] Properly fix dataset test that passes by accident (#3434)
- [documentation] Improve the documentation infrastructure (#3868) (#3724) (#3834) (#3689) (#3700) (#3513) (#3671) (#3490) (#3660) (#3594)
- [documentation] Various documentation improvements (#3793) (#3715) (#3727) (#3838) (#3701) (#3923) (#3643) (#3537) (#3691) (#3453) (#3437) (#3732) (#3683) (#3853) (#3684) (#3576) (#3739) (#3530) (#3586) (#3744) (#3645) (#3694) (#3584) (#3615) (#3693) (#3706) (#3646) (#3780) (#3704) (#3774) (#3634)(#3591)(#3807)(#3663)
- [documentation, ci] Improve the CI infrastructure for documentation (#3734) (#3837) (#3796) (#3711)
- [io] remove deprecated function calls (#3859) (#3858)
- [documentation, io] Improve IO docs and expose
ImageReadModeintorchvision.io(#3812) - [onnx, models] Replace
reshapewithflattenin MobileNetV2 (#3462) - [ops, tests] Added test for
aligned=True(#3540) - [ops, tests] Add onnx test for
batched_nms(#3483) - [tests] Various test improvements (#3548) (#3422) (#3435) (#3860) (#3479) (#3721) (#3872) (#3908) (#2916) (#3917) (#3920) (#3579)
- [transforms] add
__repr__fortransforms.RandomErasing(#3491) - [transforms, documentation] Adds Documentation for AutoAugmentation (#3529)
- [transforms, documentation] Add illustrations of transforms with sphinx-gallery (#3652)
- [datasets] Remove pandas dependency for CelebA dataset (#3656, #3698)
- [documentation] Add docs for missing datasets (#3536)
- [referencescripts] Make reference scripts compatible with
submitit(#3785) - [referencescripts] Updated
all_gather()to make use ofall_gather_object()from PyTorch (#3857) - [datasets] Added dataset download support in fbcode (#3823) (#3826)
Code quality
- Remove inconsistent FB copyright headers (#3741)
- Keep consistency in classes
ConvBNActivation(#3750) - Removed unused imports (#3738, #3740, #3639)
- Fixed
floor_dividedeprecation warnings seen in pytest output (#3672) - Unify onnx and JIT
resizeimplementations (#3654) - Cleaned-up imports in test files related to datasets (#3720)
- [documentation] Remove old css file (#3839)
- [ci] Fix inconsistent version pinning across yaml files (#3790)
- [datasets] Remove redundant
path.joininPlaces365(#3545) - [datasets] Remove imprecise error handling in
PhotoTourdataset (#3488) - [datasets, tests] Remove obsolete
test_datasets_transforms.py(#3867) - [models] Making protected params of MobileNetV3 public (#3828)
- [models] Make target argument in
transform.pytruly optional (#3866) - [models] Adding some references on MobileNetV3 implementation. (#3850)
- [models] Refactored
set_cell_anchors()inAnchorGenerator(#3755) - [ops] Minor cleanup of
roi_align_forward_kernel_impl(#3619) - [ops] Replace deprecated
AutoNonVariableTypeModewithAutoDispatchBelowADInplaceOrView. (#3786, #3897) - [tests] Port tests to use pytest (#3852, #3845, #3697, #3907, #3749)
- [ops, tests] simplify
get_script_fn(#3541) - [tests] Use torch.testing.assert_close in out test suite (#3886) (#3885) (#3883) (#3882) (#3881) (#3887) (#3880) (#3878) (#3877) (#3875) (#3888) (#3874) (#3884) (#3876) (#3879) (#3873)
- [tests] Clean up test accept behaviour (#3759)
- [tests] Remove unused
masksvariable intest_image.py(#3910) - [transforms] use ternary if in
resize(#3533) - [transforms] replaced deprecated call to
ByteTensorwithfrom_numpy(#3813) - [transforms] Remove unnecessary casting in
adjust_gamma(#3472)
Bugfixes
- [ci] set empty cxx flags as default (#3474)
- [android][test_app] Cleanup duplicate dependency (#3428)
- Remove leftover exception (#3717)
- Corrected spelling in a
TypeError(#3659) - Add missing device info. (#3651)
- Moving tensors to the right device (#3870)
- Proper error message (#3725)
- [ci, io] Pin JPEG version to resolve the size_t issue on windows (#3787)
- [datasets] Make LSUN OS agnostic (#3455)
- [datasets] Update
squeezeneturls (#3581) - [datasets] Add
.item()to thetargetvariable infakedataset.py(#3587) - [datasets] Fix VOC da...
Dataset bugfixes
Highlights
This minor release bumps the pinned PyTorch version to v1.8.1, and brings a few bugfixes for datasets, including MNIST download not being available.
Bugfixes
Mobile support, AutoAugment, improved IO and more
This release introduces improved support for mobile, with new mobile-friendly models, pre-compiled binaries for Android available in maven and an android demo app. It also improves image IO and provides new data augmentations including AutoAugment.
Highlights
Better mobile support
torchvision 0.9 adds support for the MobileNetV3 architecture with pre-trained weights for Classification, Object Detection and Segmentation tasks.
It also improves C++ operators so that they can be compiled and run on Android, and we are providing pre-compiled torchvision artifacts published to jcenter. An example application on how to use the torchvision ops on an Android app can be found in here.
Classification
We provide MobileNetV3 variants (including a quantized version) pre-trained on ImageNet 2012.
import torch
import torchvision
# Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.mobilenet_v3_large(pretrained=True)
# m_classifier = torchvision.models.mobilenet_v3_small(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)
# Quantized Classification
x = torch.rand(1, 3, 224, 224)
m_classifier = torchvision.models.quantization.mobilenet_v3_large(pretrained=True)
m_classifier.eval()
predictions = m_classifier(x)The pre-trained models have the following accuracies on ImageNet 2012 val:
| Model | Top-1 Acc | Top-5 Acc |
|---|---|---|
| MobileNetV3 Large | 74.042 | 91.340 |
| MobileNetV3 Large (Quantized) | 73.004 | 90.858 |
| MobileNetV3 Small | 67.620 | 87.404 |
Object Detection
We provide two variants of Faster R-CNN with MobileNetV3 backbone pre-trained on COCO train2017. They can be obtained as follows
import torch
import torchvision
# Fast Low Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
# Highly Accurate High Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)And yield the following accuracies on COCO val 2017 (full results available in #3265):
| Model | mAP | mAP@50 | mAP@75 |
|---|---|---|---|
| Faster R-CNN MobileNetV3-Large 320 FPN | 22.8 | 38.0 | 23.2 |
| Faster R-CNN MobileNetV3-Large FPN | 32.8 | 52.5 | 34.3 |
Semantic Segmentation
We also provide pre-trained models for semantic segmentation. The models have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.
import torch
import torchvision
# Fast Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.lraspp_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)
# Highly Accurate Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.deeplabv3_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC (full results in #3276):
| Model | mean IoU | global pixelwise accuracy |
|---|---|---|
| Lite R-ASPP with Dilated MobileNetV3 Large Backbone | 57.9 | 91.2 |
| DeepLabV3 with Dilated MobileNetV3 Large Backbone | 60.3 | 91.2 |
Addition of the AutoAugment method
AutoAugment is a common Data Augmentation technique that can improve the accuracy of Scene Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when applied to other datasets.
In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFA10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms:
from torchvision import transforms
t = transforms.AutoAugment()
transformed = t(image)
transform=transforms.Compose([
transforms.Resize(256),
transforms.AutoAugment(),
transforms.ToTensor()])Improved Image IO and on-the-fly image type conversions
All the read and decode methods of the io.image package have been updated to:
- Add support for Palette, Grayscale Alpha and RBG Alpha image types during PNG decoding.
- Allow the on-the-fly conversion of image from one type to the other during read.
from torchvision.io.image import read_image, ImageReadMode
# keeps original type, channels unchanged
x1 = read_image("image.png")
# converts to grayscale, channels = 1
x2 = read_image("image.png", mode=ImageReadMode.GRAY)
# converts to grayscale with alpha transparency, channels = 2
x3 = read_image("image.png", mode=ImageReadMode.GRAY_ALPHA)
# coverts to RGB, channels = 3
x4 = read_image("image.png", mode=ImageReadMode.RGB)
# converts to RGB with alpha transparency, channels = 4
x5 = read_image("image.png", mode=ImageReadMode.RGB_ALPHA)Python 3.9 and CUDA 11.1
This release adds official support for Python 3.9 and CUDA 11.1 (#3341, #3418)
Backwards Incompatible Changes
- [Ops] Change default
epsvalue ofFrozenBNto better align withnn.BatchNorm(#2933) - [Ops] Remove deprecated _new_empty_tensor. (#3156)
- [Transforms]
ColorJittergets its random params by callingget_params()(#3001) - [Transforms] Change rounding of transforms on integer tensors (#2964)
- [Utils] Remove
normalizefromsave_image(#3324)
New Features
- [Datasets] Add WiderFace dataset (#2883)
- [Models] Add MobileNetV3 architecture:
- [Models] Improve speed/accuracy of FasterRCNN by introducing a score threshold on RPN (#3205)
- [Mobile] Add Android gradle project with demo test app (#2897)
- [Transforms] Implemented AutoAugment, along with required new transforms + Policies (#3123)
- [Ops] Added support of Autocast in all Operators: #2938, #2926, #2922, #2928, #2905, #2906, #2907, #2898
- [Ops] Add modulation input for DeformConv2D (#2791)
- [IO] Improved
io.imagewith on-the-fly image type conversions: (#3193, #3069, #3024, #2988, #2984) - [IO] Add option to write audio to video file (#2304)
- [Utils] Added a utility to draw bounding boxes (#2785, #3296, #3075)
Improvements
Datasets
- Concatenate small tensors in video datasets to reduce the use of shared file descriptor (#1795)
- Improve testing for datasets (#3336, #3337, #3402, #3412, #3413, #3415, #3416, #3345, #3376, #3346, #3338)
- Check if dataset file is located on Google Drive before downloading it (#3245)
- Improve Coco implementation (#3417)
- Make download_url follow redirects (#3236)
make_datasetasstaticmethodofDatasetFolder(#3215)- Add a warning if any clip can't be obtained from a video in
VideoClips. (#2513)
Models
- Improve error message in
AnchorGenerator(#2960) - Disable pretrained backbone downloading if pretrained is True in segmentation models (#3325)
- Support for image with no annotations in RetinaNet (#3032)
- Change RoIHeads reshape to support empty batches. (#3031)
- Fixed typing exception throwing issues with JIT (#3029)
- Replace deprecated
functional.sigmoidwithtorch.sigmoidin RetinaNet (#3307) - Assert that inputs are floating point in Faster R-CNN normalize method (#3266)
- Speedup RetinaNet's postprocessing (#2828)
Ops
- Added eps in the
__repr__of FrozenBN (#2852) - Added
__repr__toMultiScaleRoIAlign(#2840) - Exposing LevelMapper params in
MultiScaleRoIAlign(#3151) - Enable autocast for all operators and let them use the dispatcher (#2926, #2922, #2928, #2898)
Transforms
adjust_huenow accepts tensors with one channel (#3222)- Add
fillcolor support for tensor affine transforms (#2904) - Remove torchscript workaround for
center_crop(#3118) - Improved error message for
RandomCrop(#2816)
IO
- Enabling to import
read_fileand the other methods from torchvision.io (#2918) - accept python bytes in
_read_video_from_memory()(#3347) - Enable rtmp timeout in decoder (#3076)
- Specify tls cert file to decoder through config (#3289, #3374)
- Add UUID in LOG() in decoder (#3080)
References
- Add weight averaging and storing methods in references utils (#3352)
- Adding Preset Transforms in reference scripts (#3317)
- Load variables when
--resume /path/to/checkpoint --test-only(#3285) - Updated video classification ref example with new transforms (#2935)
Misc
- Various documentation improvements (#3039, #3271, #2820, #2808, #3131, #3062, #3061, #3000, #3299, #3400, #2899, #2901, #2908, #2851, #2909, #3005, #2821, #2957, #3360, #3019, #3124, #3217, #2879, #3234, #3180, #3425, #2979, #2935, #3298, #3268, #3203, #3290, #3295, #3200, #2663, #3153, #3147, #3232)
- The documentation infrastructure was improved, in particular the docs are now built on every PR and uploaded to CircleCI (#3259, #3378, #3408, #3373, #3290)
- Avoid some deprecation warnings from PyTorch (#3348)
- Ensure operators are added in C++ (#2798, #3091, #3391)
- Fixed compilation warnings on C++ codebase (#3390)
- CI Improvements (#3401, #3329, #2990, #2978, #3189, #3230, #3254, #2844, #2872, #2825, #3144, #3137, #2827, #2848, #2914, #3419, #2895, #2837)
- Installation improvements (#3302, #2969, #3113, #3202)
- CMake improvemen...
Python 3.9 support and bugfixes
This minor release bumps the pinned PyTorch version to v1.7.1, and contains some minor improvements.
Highlights
Python 3.9 support
This releases add native binaries for Python 3.9 #3063
Bugfixes
Added version suffix back to package
Issues resolved:
- Cannot pip install torchvision==0.8.0+cu110 - #2912
Improved transforms, native image IO, new video API and more
This release brings new additions to torchvision that improves support for model deployment. Most notably, transforms in torchvision are now torchscript-compatible, and can thus be serialized together with your model for simpler deployment. Additionally, we provide native image IO with torchscript support, and a new video reading API (released as Beta) which is more flexible than torchvision.io.read_video.
Highlights
Transforms now support Tensor, batch computation, GPU and TorchScript
torchvision transforms are now inherited from nn.Module and can be torchscripted and applied on torch Tensor inputs as well as on PIL images. They also support Tensors with batch dimension and work seamlessly on CPU/GPU devices:
import torch
import torchvision.transforms as T
# to fix random seed, use torch.manual_seed
# instead of random.seed
torch.manual_seed(12)
transforms = torch.nn.Sequential(
T.RandomCrop(224),
T.RandomHorizontalFlip(p=0.3),
T.ConvertImageDtype(torch.float),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
)
scripted_transforms = torch.jit.script(transforms)
# Note: we can similarly use T.Compose to define transforms
# transforms = T.Compose([...]) and
# scripted_transforms = torch.jit.script(torch.nn.Sequential(*transforms.transforms))
tensor_image = torch.randint(0, 256, size=(3, 256, 256), dtype=torch.uint8)
# works directly on Tensors
out_image1 = transforms(tensor_image)
# on the GPU
out_image1_cuda = transforms(tensor_image.cuda())
# with batches
batched_image = torch.randint(0, 256, size=(4, 3, 256, 256), dtype=torch.uint8)
out_image_batched = transforms(batched_image)
# and has torchscript support
out_image2 = scripted_transforms(tensor_image)These improvements enable the following new features:
- support for GPU acceleration
- batched transformations e.g. as needed for videos
- transform multi-band torch tensor images (with more than 3-4 channels)
- torchscript transforms together with your model for deployment
Note: Exceptions for TorchScript support includes Compose, RandomChoice, RandomOrder, Lambda and those applied on PIL images, such as ToPILImage.
Native image IO for JPEG and PNG formats
torchvision 0.8.0 introduces native image reading and writing operations for JPEG and PNG formats. Those operators support TorchScript and return CxHxW tensors in uint8 format, and can thus be now part of your model for deployment in C++ environments.
from torchvision.io import read_image
# tensor_image is a CxHxW uint8 Tensor
tensor_image = read_image('path_to_image.jpeg')
# or equivalently
from torchvision.io.image import read_file, decode_image
# raw_data is a 1d uint8 Tensor with the raw bytes
raw_data = read_file('path_to_image.jpeg')
tensor_image = decode_image(raw_data)
# all operators are torchscriptable and can be
# serialized together with your model torchscript code
scripted_read_image = torch.jit.script(read_image)New detection model
This release adds a pretrained model for RetinaNet with a ResNet50 backbone from Focal Loss for Dense Object Detection, with the following accuracies on COCO val2017:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.364
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.558
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.383
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.400
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.490
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.315
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.506
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.558
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.386
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.699
[BETA] New Video Reader API
This release introduces a new video reading abstraction, which gives more fine-grained control on how to iterate over the videos. It supports image and audio, and implements an iterator interface so that it can be combined with the rest of the python ecosystem, such as itertools.
from torchvision.io import VideoReader
# stream indicates if reading from audio or video
reader = VideoReader('path_to_video.mp4', stream='video')
# can change the stream after construction
# via reader.set_current_stream
# to read all frames in a video starting at 2 seconds
for frame in reader.seek(2):
# frame is a dict with "data" and "pts" metadata
print(frame["data"], frame["pts"])
# because reader is an iterator you can combine it with
# itertools
from itertools import takewhile, islice
# read 10 frames starting from 2 seconds
for frame in islice(reader.seek(2), 10):
pass
# or to return all frames between 2 and 5 seconds
for frame in takewhile(lambda x: x["pts"] < 5, reader.seek(2)):
passNote: In order to use the Video Reader API, you need to compile torchvision from source and make sure that you have ffmpeg installed in your system.
Note: the VideoReader API is currently released as beta and its API can change following user feedback.
Backwards Incompatible Changes
- [Transforms] Random seed now should be set with
torch.manual_seedinstead ofrandom.seed(#2292) - [Transforms]
RandomErasing.get_paramsfunction’s argument was previouslyvalue=0and is nowvalue=Nonewhich is interpreted as Gaussian random noise (#2386) - [Transforms]
RandomPerspectiveandF.perspectivechanged the default value of interpolation to beBILINEARinstead ofBICUBIC(#2558, #2561) - [Transforms] Fixes incoherence in
affinetransformation when center is defined as half image size + 0.5 (#2468)
New Features
- [Ops] Added focal loss (#2784)
- [Ops] Added bounding boxes conversion function (#2710, #2737)
- [Ops] Added Generalized IOU (#2642)
- [Models] Added RetinaNet object detection model (#2784)
- [Datasets] Added Places365 dataset (#2610, #2625)
- [Transforms] Added GaussianBlur transform (#2658)
- [Transforms] Added torchscript, batch and GPU and tensor support for transforms (#2769, #2767, #2749, #2755, #2485, #2721, #2645, #2694, #2584, #2661, #2566, #2345, #2342, #2356, #2368, #2373, #2496, #2553, #2495, #2561, #2518, #2478, #2459, #2444, #2396, #2401, #2394, #2586, #2371, #2477, #2456, #2628, #2569, #2639, #2620, #2595, #2456, #2403, #2729)
- [Transforms] Added example notebook for tensor transforms (#2730)
- [IO] Added JPEG/PNG encoding / decoding ops
- [IO] Added file reading / writing ops (#2728, #2765, #2768)
- [IO] [BETA] Added new VideoReader API (#2683, #2781, #2778, #2802, #2596, #2612, #2734, #2770)
Improvements
Datasets
- Added error message if Google Drive download quota is exceeded (#2321)
- Optimized LSUN initialization time by only pulling keys from db (#2544)
- Use more precise return type for gzip.open() (#2792)
- Added UCF101 dataset tests (#2548)
- Added download tests on a schedule (#2665, #2675, #2699, #2706, #2747, #2731)
- Added typehints for datasets (#2487, #2521, #2522, #2523, #2524, #2526, #2528, #2529, #2525, #2527, #2530, #2533, #2534, #2535, #2536, #2532, #2538, #2537, #2539, #2531, #2540, #2667)
Models
- Removed hard coded value in DeepLabV3 (#2793)
- Changed the anchor generator default argument to an equivalent one (#2722)
- Moved model construction location in
resnet_fpn_backboneinto after docstring (#2482) - Partially enabled type hints for models (#2668)
Ops
- Moved RoIs shape check to C++ (#2794)
- Use autocast built-in cast-helper functions (#2646)
- Adde type annotations for
torchvision.ops(#2331, #2462)
References
- [References] Removed redundant target send to device in detection evaluation (#2503)
- [References] Removed obsolete import in segmentation. (#2399)
Misc
- [Transforms] Added support for negative padding in
pad(#2744) - [IO] Added type hints for
torchvision.io(#2543) - [ONNX] Export
ROIAlignwithaligned=True(#2613)
Internal
- [Binaries] Added CUDA 11 binary builds (#2671)
- [Binaries] Added DEBUG=1 option to build torchvision (#2603)
- [Binaries] Unpin ninja version (#2358)
- Warn if torchvision imported from repo root (#2759)
- Added compatibility checks for C++ extensions (#2467)
- Added probot (#2448)
- Added ipynb to git attributes file (#2772)
- CI improvements (#2328, #2346, #2374, #2437, #2465, #2579, #2577, #2633, #2640, #2727, #2754, #2674, #2678)
- CMakeList improvements (#2739, #2684, #2626, #2585, #2587)
- Documentation improvements (#2659, #2615, #2614, #2542, #2685, #2507, #2760, #2550, #2656, #2723, #2601, #2654, #2757, #2592, #2606)
Bug Fixes
- [Ops] Fixed crash in deformable convolutions (#2604)
- [Ops] Added empty batch support for
DeformConv2d(#2782) - [Transforms] Enforced contiguous output in
to_tensor(#2483) - [Transforms] Fixed fill parameter for PIL pad (#2515)
- [Models] Fixed deprecation warning in
nonzerofor R-CNN models (#2705) - [IO] Explicitly cast to
size_tin video decoder (#2389) - [ONNX] Fixed dynamic resize in Mask R-CNN (#2488)
- [C++ API] Fixed function signatures for
torch::nn::Functional(#2463)
Deprecations
- [Transforms] Deprecated dedicated implementations
functional_tensorofF_t.center_crop,F_t.five_crop, `F_t.te...
Mixed precision training, new models and improvements
Highlights
Mixed precision support for all models
torchvision models now support mixed-precision training via the new torch.cuda.amp package. Using mixed precision support is easy: just wrap the model and the loss inside a torch.cuda.amp.autocast context manager. Here is an example with Faster R-CNN:
import torch, torchvision
device = torch.device('cuda')
model = torchvision.models.detection.fasterrcnn_resnet50_fpn()
model.to(device)
input = [torch.rand(3, 300, 400, device=device)]
boxes = torch.rand((5, 4), dtype=torch.float32, device=device)
boxes[:, 2:] += boxes[:, :2]
target = [{"boxes": boxes,
"labels": torch.zeros(5, dtype=torch.int64, device=device),
"image_id": 4,
"area": torch.zeros(5, dtype=torch.float32, device=device),
"iscrowd": torch.zeros((5,), dtype=torch.int64, device=device)}]
# use automatic mixed precision
with torch.cuda.amp.autocast():
loss_dict = model(input, target)
losses = sum(loss for loss in loss_dict.values())
# perform backward outside of autocast context manager
losses.backward()New pre-trained segmentation models
This releases adds pre-trained weights for the ResNet50 variants of Fully-Convolutional Networks (FCN) and DeepLabV3.
They are available under torchvision.models.segmentation, and can be obtained as follows:
torchvision.models.segmentation.fcn_resnet50(pretrained=True)
torchvision.models.segmentation.deeplabv3_resnet50(pretrained=True)They obtain the following accuracies:
| Network | mean IoU | global pixelwise acc |
|---|---|---|
| FCN ResNet50 | 60.5 | 91.4 |
| DeepLabV3 ResNet50 | 66.4 | 92.4 |
Improved ONNX support for Faster / Mask / Keypoint R-CNN
This release restores ONNX support for the R-CNN family of models that had been temporarily dropped in the 0.6.0 release, and additionally fixes a number of corner cases in the ONNX export for these models.
Notable improvements includes support for dynamic input shape exports, including images with no detections.
Backwards Incompatible Changes
- [Transforms] Fix for integer fill value in constant padding (#2284)
- [Models] Replace L1 loss with smooth L1 loss in Faster R-CNN for better performance (#2113)
- [Transforms] Use
torch.randinstead ofrandom.random()for random transforms (#2520)
New Features
- [Models] Add mixed-precision support (#2366, #2384)
- [Models] Add
fcn_resnet50anddeeplabv3_resnet50pretrained models. (#2086, #2091) - [Ops] Added eps attribute to FrozenBatchNorm2d (#2190)
- [Transforms] Add
convert_image_dtypeto functionals (#2078) - [Transforms] Add
pil_to_tensorto functionals (#2092)
Bug Fixes
- [JIT] Fix virtualenv and torchhub support by removing eager scripting calls (#2248)
- [IO] Fix
write_videowhen floating point FPS is passed (#2334) - [IO] Fix missing compilation files for video-reader (#2183)
- [IO] Fix missing include for OSX in video decoder (#2224)
- [IO] Fix overflow error for large buffers. (#2303)
- [Ops] Fix wrong clamping in RoIAlign with
aligned=True(#2438) - [Ops] Fix corner case in
interpolate(#2146) - [Ops] Fix the use of
contiguous()in C++ kernels (#2131) - [Ops] Restore support of tuple of Tensors for region pooling ops (#2199)
- [Datasets] Fix bug related with trailing slash on UCF-101 dataset (#2186)
- [Models] Make copy of targets in GeneralizedRCNNTransform (#2227)
- [Models] Fix DenseNet issue with gradient checkpoints (#2236)
- [ONNX] Fix ONNX implementation of
heatmaps_to_keypointsin KeypointRCNN (#2312) - [ONNX] Fix export of images with no detection for Faster / Mask / Keypoint R-CNN (#2126, #2215, #2272)
Deprecations
- [Ops] Deprecate Conv2d, ConvTranspose2d and BatchNorm2d (#2244)
- [Ops] Deprecate
interpolatein favor of PyTorch's implementation (#2252)
Improvements
Datasets
- Fix DatasetFolder error message (#2143)
- Change
range(len)toenumerateinDatasetFolder(#2153) - [DOC] Fix link URL to Flickr8k (#2178)
- [DOC] Add CelebA to docs (#2107)
- [DOC] Improve documentation of
DatasetFolderandImageFolder(#2112)
TorchHub
- Fix torchhub tests due to numerical changes in torch.sum (#2361)
- Add all the latest models to hubconf (#2189)
Transforms
- Add
fillargument to__repr__ofRandomRotation(#2340) - Add tensor support for
adjust_hue(#2300, #2355) - Make
ColorJittertorchscriptable (#2298) - Make
RandomHorizontalFlipandRandomVerticalFliptorchscriptable (#2282) - [DOC] Use consistent symbols in the doc of
Normalizeto avoid confusion (#2181) - [DOC] Fix typo in
hflipinfunctional.py(#2177) - [DOC] Fix spelling errors in
functional.py(#2333)
IO
- Refactor
video.pyto improve clarity (#2335) - Save memory by not storing full frames in
read_video_timestamps(#2202, #2268) - Improve warning when
video_readerbackend is not available (#2225) - Set
should_bufferto True by default in_read_from_stream(#2201) - [Test] Temporarily disable one PyAV test (#2150)
Models
- Improve target checks in GeneralizedRCNN (#2207, #2258)
- Use Module objects instead of functions for some layers of Inception3 (#2287)
- Add support for other normalizations in MobileNetV2 (#2267)
- Expose layer freezing option to detection models (#2160, #2242)
- Make ASPP-Layer in DeepLab more generic (#2174)
- Faster initialization for Inception family of models (#2170, #2211)
- Make
norm_layeras parameters inmodels/detection/backbone_utils.py(#2081) - Updates integer division to use floor division operator (#2234, #2243)
- [JIT] Clean up no longer needed workarounds for torchscript support (#2249, #2261, #2210)
- [DOC] Add docs to clarify aspect ratio definition in RPN. (#2185)
- [DOC] Fix roi_heads argument name in doctstring of GeneralizedRCNN (#2093)
- [DOC] Fix type annotation in RPN docstring (#2149)
- [DOC] add clarifications to Object detection reference documentation (#2241)
- [Test] Add tests for negative samples for Mask R-CNN and Keypoint R-CNN (#2069)
Reference scripts
- Add support for SyncBatchNorm in QAT reference script (#2230, #2280)
- Fix training resuming in
references/segmentation(#2142) - Rename
imagetoimagesinreferences/detection/engine.py(#2187)
ONNX
- Add support for dynamic input shape export in R-CNN models (#2087)
Ops
- Added number of features in FrozenBatchNorm2d
__repr__(#2168) - improve consistency among box IoU CPU / GPU calculations (#2072)
- Avoid
usingin header files (#2257) - Make
ceil_div__host__ __device__(#2217) - Don't include CUDAApplyUtils.cuh (#2127)
- Add namespace to avoid conflict with ATen version of
channel_shuffle()(#2206) - [DOC] Update the statement of supporting torchscript ops (#2343)
- [DOC] Update torchvision ops in doc (#2341)
- [DOC] Improve documentation for NMS (#2159)
- [Test] Add more tests to NMS (#2279)
Misc
- Add PyTorch version compatibility table to README (#2260)
- Fix lint (#2182, #2226, #2070)
- Update version to 0.6.0 in CMake (#2140)
- Remove mock (#2096)
- Remove warning about deprecated (#2064)
- Cleanup unused import (#2067)
- Type annotations for torchvision/utils.py (#2034)
CI
- Add version suffix to build version
- Add backslash to escape
- Add workflows to run on tag
- Bump version to 0.7.0, pin PyTorch to 1.6.0
- Update link for cudnn 10.2 (#2277)
- Fix binary builds with CUDA 9.2 on Windows (#2273)
- Remove Python 3.5 from CI (#2158)
- Improvements to CI infra (#2075, #2071, #2058, #2073, #2099, #2137, #2204, #2264, #2274, #2319)
- Master version bump 0.6 -> 0.7 (#2102)
- Add test channels for pytorch version functions (#2208)
- Add static type check with mypy (#2195, #1696, #2247)
v0.6.1
Drop Python 2 support, several improvements and bugfixes
This release is the first one that officially drops support for Python 2.
It contains a number of improvements and bugfixes.
Highlights
Faster/Mask/Keypoint RCNN supports negative samples
It is now possible to feed training images to Faster / Mask / Keypoint R-CNN that do not contain any positive annotations.
This enables increasing the number of negative samples during training. For those images, the annotations expect a tensor with 0 in the number of objects dimension, as follows:
target = {"boxes": torch.zeros((0, 4), dtype=torch.float32),
"labels": torch.zeros(0, dtype=torch.int64),
"image_id": 4,
"area": torch.zeros(0, dtype=torch.float32),
"masks": torch.zeros((0, image_height, image_width), dtype=torch.uint8),
"keypoints": torch.zeros((17, 0, 3), dtype=torch.float32),
"iscrowd": torch.zeros((0,), dtype=torch.int64)}Aligned flag for RoIAlign
RoIAlign now supports the aligned flag, which aligns more precisely two neighboring pixel indices.
Refactored abstractions for C++ video decoder
This change is transparent to Python users, but the whole C++ backend for video reading (which needs torchvision to be compiled from source for it to be enabled for now) has been refactored into more modular abstractions.
The core abstractions are in https://github.com/pytorch/vision/tree/master/torchvision/csrc/cpu/decoder, and the video reader functions exposed to Python, by leveraging those abstractions, can be written in a much more concise way
Backwards Incompatible Changes
- Dropping Python2 support (#1761, #1792, #1984, #1976, #2037, #2033, #2017)
- [Models] Fix inception quantized pre-trained model (#1954, #1969, #1975)
- ONNX support for Mask R-CNN and Keypoint R-CNN has been temporarily dropped, but will be fixed in next releases
New Features
- [Transforms] Add Perspective fill option (#1973)
- [Ops]
alignedflag in ROIAlign (#1908) - [IO] Update video reader to use new decoder (#1978)
- [IO] torchscriptable functions for video io (#1653, #1794)
- [Models] Support negative samples in Faster R-CNN, Mask R-CNN and Keypoint R-CNN (#1911, #2069)
Improvements
Datasets
- STL10: don't check integrity twice when download=True (#1787)
- Improve code readability and docstring of video datasets(#2020)
- [DOC] Fixed typo in Cityscapes docs (#1851)
Transforms
- Allow passing list to the input argument 'scale' of RandomResizedCrop (#1997) (#2008)
- F.normalize unsqueeze mean & std only for 1-d arrays (#2002)
- Improved error messages for transforms.functional.normalize(). (#1915)
- generalize number of bands calculation in to_tensor (#1781)
- Replace 2 transpose ops with 1 permute in ToTensor(#2018)
- Fixed Pillow version check for Pillow >= 10 (#2039)
- [DOC]: Improve transforms.Normalize docs (#1784, #1858)
- [DOC] Fixed missing new line in transforms.Crop docstring (#1922)
Ops
- Check boxes shape in RoIPool / Align (#1968)
- [ONNX] Export new_empty_tensor (#1733)
- Fix Tensor::data<> deprecation. (#2028)
- Fix deprecation warnings (#2055)
Models
- Add warning and note docs for scipy (#1842) (#1966)
- Added repr attribute to GeneralizedRCNNTransform (#1834)
- Replace mean on dimensions 2,3 by adaptive_avg_pooling2d in mobilenet (#1838)
- Add init_weights keyword argument to Inception3 (#1832)
- Add device to torch.tensor. (#1979)
- ONNX export for variable input sizes in Faster R-CNN (#1840)
- [JIT] Cleanup torchscript constant annotations (#1721, #1923, #1907, #1727)
- [JIT] use // now that it is supported (#1658)
- [JIT] add @torch.jit.script to ImageList (#1919)
- [DOC] Improved docs for Faster R-CNN (#1886, #1868, #1768, #1763)
- [DOC] add comments for the modified implementation of ResNet (#1983)
- [DOC] Add comments to AnchorGenerator (#1941)
- [DOC] Add comment in GoogleNet (#1932)
Documentation
- Document int8 quantization model (#1951)
- Update Doc with ONNX support (#1752)
- Update README to reflect strict dependency on torch==1.4.0 (#1767)
- Update sphinx theme (#2031)
- Document origin of preprocessing mean / std (#1965)
- Fix docstring formatting issues (#2049)
Reference scripts
- Add return statement in evaluate function of detection reference script (#2029)
- [DOC]Add default training parameters to classification reference README (#1998)
- [DOC] Add README to references/segmentation (#1864)
Tests
- Improve stability of test_nms_cuda (#2044)
- [ONNX] Disable model tests since export of interpolate script module is broken (#1989)
- Skip inception v3 in test/test_quantized_models (#1885)
- [LINT] Small indentation fix (#1831)
Misc
- Remove unintentional -O0 option in setup.py (#1770)
- Create CODE_OF_CONDUCT.md
- Update issue templates (#1913, #1914)
- master version bump 0.5 → 0.6
- replace torch 1.5.0 items flagged with deprecation warnings (fix #1906) (#1918)
- CUDA_SUFFIX → PYTORCH_VERSION_SUFFIX
CI
- Remove av from the binary requirements (#2006)
- ci: Add cu102 to CI and packaging, remove cu100 (#1980)
- .circleci: Switch to use token for conda uploads (#1960)
- Improvements to CI infra (#2051, #2032, #2046, #1735, #2048, #1789, #1731, #1961)
- typing only needed for python 3.5 and previous (#1778)
- Move C++ and Python linter to CircleCI (#2056, #2057)
Bug Fixes
Datasets
- bug fix on downloading voc2007 test dataset (#1991)
- fix lsun docstring example (#1935)
- Fixes EMNIST classes attribute is wrong #1716 (#1736)
- Force object annotation to be a list in VOC (#1790)
Models
- Fix for AnchorGenerator when device switch happen (#1745)
- [JIT] fix len error (#1981)
- [JIT] fix googlenet no aux logits (#1949)
- [JIT] Fix quantized googlenet (#1974)
Transforms
Ops
- Fix bug in DeformConv2d for batch sizes > 32 (#2027, #2040)
- Fix for roi_align ONNX export (#1988)
- Fix torchscript issue in ConvTranspose2d (#1917)
- Fix interpolate when no scale_factor is passed (#1785)
- Fix Windows build by renaming Python init functions (#1779)
- fix for loading models with num_batches_tracked in frozen bn (#1728)
Deprecations
- the pts_unit of pts from read_video and read_video_timestamp is deprecated, and will be replaced in next releases with seconds.
Towards better research to production support
This release brings several new additions to torchvision that improves support for deployment. Most notably, all models in torchvision are torchscript-compatible, and can be exported to ONNX. Additionally, a few classification models have quantized weights.
Note: this is the last version of torchvision that officially supports Python 2.
Breaking changes
Updated KeypointRCNN pre-trained weights
The pre-trained weights for keypointrcnn_resnet50_fpn have been updated and now correspond to the results reported in the documentation. The previous weights corresponded to an intermediate training checkpoint. (#1609)
Corrected the implementation for MNASNet
The previous implementation contained a bug which affects all MNASNet variants other than mnasnet1_0. The bug was that the first few layers needed to also be scaled in terms of width multiplier, along with all the rest. We now provide a new checkpoint for mnasnet0_5, which gives 32.17 top1 error. (#1224)
Highlights
TorchScript support for all models
All models in torchvision have native support for torchscript, for both training and testing. This includes complex models such as DeepLabV3, Mask R-CNN and Keypoint R-CNN.
Using torchscript with torchvision models is easy:
# get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
# convert to torchscript
model_script = torch.jit.script(model)
model_script.eval()
# compute predictions
predictions = model_script([torch.rand(3, 300, 300)])Warning: the return type for the scripted version of Faster R-CNN, Mask R-CNN and Keypoint R-CNN is different from its eager counterpart, and it always returns a tuple of losses, detections. This discrepancy will be addressed in a future release.
ONNX
All models in torchvision can now be exported to ONNX for deployment. This includes models such as Mask R-CNN.
# get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
inputs = [torch.rand(3, 300, 300)]
predictions = model(inputs)
# convert to ONNX
torch.onnx.export(model, inputs, "model.onnx",
do_constant_folding=True,
opset_version=11 # opset_version 11 required for Mask R-CNN
)Warning: for Faster R-CNN / Mask R-CNN / Keypoint R-CNN, the current exported model is dependent on the input shape during export. As such, make sure that once the model has been exported to ONNX that all images that are fed to it have the same shape as the shape used to export the model to ONNX. This behavior will be made more general in a future release.
Quantized models
torchvision now provides quantized models for ResNet, ResNext, MobileNetV2, GoogleNet, InceptionV3 and ShuffleNetV2, as well as reference scripts for quantizing your own model in references/classification/train_quantization.py (https://github.com/pytorch/vision/blob/master/references/classification/train_quantization.py). Obtaining a pre-trained quantized model can be obtained with a few lines of code:
model = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
model.eval()
# run the model with quantized inputs and weights
out = model(torch.rand(1, 3, 224, 224))We provide pre-trained quantized weights for the following models:
| Model | Acc@1 | Acc@5 |
|---|---|---|
| MobileNet V2 | 71.658 | 90.150 |
| ShuffleNet V2: | 68.360 | 87.582 |
| ResNet 18 | 69.494 | 88.882 |
| ResNet 50 | 75.920 | 92.814 |
| ResNext 101 32x8d | 78.986 | 94.480 |
| Inception V3 | 77.084 | 93.398 |
| GoogleNet | 69.826 | 89.404 |
Torchscript support for torchvision.ops
torchvision ops are now natively supported by torchscript. This includes operators such as nms, roi_align and roi_pool, and for the ops that support backpropagation, both eager and torchscript modes are supported in autograd.
New operators
Deformable Convolution (#1586) (#1660) (#1637)
As described in Deformable Convolutional Networks (https://arxiv.org/abs/1703.06211), torchvision now supports deformable convolutions. The model expects as input both the input as well as the offsets, and can be used as follows:
from torchvision import ops
module = ops.DeformConv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)
x = torch.rand(1, 1, 10, 10)
# number of channels for offset should be a multiple
# of 2 * module.weight.size[2] * module.weight.size[3], which correspond
# to the kernel_size
offset = torch.rand(1, 2 * 3 * 3, 10, 10)
# the output requires both the input and the offsets
out = module(x, offset)If needed, the user can create their own wrapper module that imposes constraints on the offset. Here is an example, using a single convolution layer to compute the offset:
class BasicDeformConv2d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1,
dilation=1, groups=1, offset_groups=1):
super().__init__()
offset_channels = 2 * kernel_size * kernel_size
self.conv2d_offset = nn.Conv2d(
in_channels,
offset_channels * offset_groups,
kernel_size=3,
stride=stride,
padding=dilation,
dilation=dilation,
)
self.conv2d = ops.DeformConv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=dilation,
dilation=dilation,
groups=groups,
bias=False
)
def forward(self, x):
offset = self.conv2d_offset(x)
return self.conv2d(x, offset)Position-sensitive RoI Pool / Align (#1410)
Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN (https://arxiv.org/abs/1711.07264). These are available under ops.ps_roi_align, ps_roi_pool and the module equivalents ops.PSRoIAlign and ops.PSRoIPool, and have the same interface as RoIAlign / RoIPool.
New Features
TorchScript support
- Bugfix in BalancedPositiveNegativeSampler introduced during torchscript support (#1670)
- Make R-CNN models less verbose in script mode (#1671)
- Minor torchscript fixes for Mask R-CNN (#1639)
- remove BC-breaking changes (#1560)
- Make maskrcnn scriptable (#1407)
- Add Script Support for Video Resnet Models (#1393)
- fix ASPPPooling (#1575)
- Test that torchhub models are scriptable (#1242)
- Make Googlnet & InceptionNet scriptable (#1349)
- Make fcn_resnet Scriptable (#1352)
- Make Densenet Scriptable (#1342)
- make resnext scriptable (#1343)
- make shufflenet and resnet scriptable (#1270)
ONNX
- Enable KeypointRCNN test (#1673)
- enable mask rcnn test (#1613)
- Changes to Enable KeypointRCNN ONNX Export (#1593)
- Disable Profiling in Failing Test (#1585)
- Enable ONNX Test for FasterRcnn (#1555)
- Support Exporting Mask Rcnn to ONNX (#1461)
- Lahaidar/export faster rcnn (#1401)
- Support Exporting RPN to ONNX (#1329)
- Support Exporting MultiScaleRoiAlign to ONNX (#1324)
- Support Exporting GeneralizedRCNNTransform to ONNX (#1325)
Quantization
- Update quantized shufflenet weights (#1715)
- Add commands to run quantized model with pretrained weights (#1547)
- Quantizable googlenet, inceptionv3 and shufflenetv2 models (#1503)
- Quantizable resnet and mobilenet models (#1471)
- Remove model download from test_quantized_models (#1526)
Improvements
Bugfixes
- Bugfix on GroupedBatchSampler for corner case where there are not enough examples in a category to form a batch (#1677)
- Fix rpn memory leak and dataType errors. (#1657)
- Fix torchvision install due to zippeg egg (#1536)
Transforms
- Make shear operation area preserving (#1529)
- PILLOW_VERSION deprecation updates (#1501)
- Adds optional fill colour to rotate (#1280)
Ops
- Add Deformable Convolution operation. (#1586) (#1660) (#1637)
- Fix inconsistent NMS implementation between CPU and CUDA (#1556)
- Speed up nms_cuda (#1704)
- Implementation for Position-sensitive ROI Pool/Align (#1410)
- Remove cpp extensions in favor of torch ops (#1348)
- Make custom ops differentiable (#1314)
- Fix Windows build in Torchvision Custom op Registration (#1320)
- Revert "Register Torchvision Ops as Cutom Ops (#1267)" (#1316)
- Register Torchvision Ops as Cutom Ops (#1267)
- Use Tensor.data_ptr instead of .data (#1262)
- Fix header includes for cpu (#1644)
Datasets
- fixed test for windows by closing the created temporary files (#1662)
- VideoClips windows fixes (#1661)
- Fix VOC on Windows (#1641)
- update dead LSUN link (#1626)
- DatasetFolder should follow links when searching for data (#1580)
- add .tgz support to extract_archive (#1650)
- expose audio_channels as a parameter to kinetics dataset (#1559)
- Implemented integrity check (md5 hash) after dataset download (#1456)
- Move VideoClips dummy dataset to top level for pickling (#1649)
- Remove download for ImageNet (#1457)
- add tar.xz archive handler (#1361)
- Fix DeprecationWarning for collections.Iterable import in LSUN (#1417)
- Support empty target_type for CelebA dataset (#1351)
- VOC2007 support test set (#1340)
- Fix EMNSIT download URL (#1297) (#1318)
- Refactored clip_sampler (#1562)
Documentation
- Fix documentation for NMS (#1614)
- More examples of functional transforms (#1402)
- Fixed doc of crop functionals (#1388)
- Added Training Sample code for fasterrcnn_resnet50_fpn (#1695)
- Fix rpn.py typo (#1276)
- Update README with minimum required version of PyTorch (#1272)
- fix alignment of README (#1396)
- fixed typo in DatasetFolder and ImageFolder (#1284)
Models
Utils
- Adding File...