Releases: pytorch/vision
More datasets, transforms and bugfixes
This version introduces several improvements and fixes.
Support for arbitrary input sizes for models
It is now possible to feed larger images than 224x224 into the models in torchvision.
We added an adaptive pooling just before the classifier, which adapts the size of the feature maps before the last layer, allowing for larger input images.
Relevant PRs: #744 #747 #746 #672 #643
Bugfixes
- Fix invalid argument error when using lsun method in windows (#508)
- Fix FashionMNIST loading MNIST (#640)
- Fix inception v3 input transform for trace & onnx (#621)
Datasets
- Add support for webp and tiff images in ImageFolder #736 #724
- Add K-MNIST dataset #687
- Add Cityscapes dataset #695 #725 #739 #700
- Add Flicker 8k and 30k datasets #674
- Add VOCDetection and VOCSegmentation datasets #663
- Add SBU Captioned Photo Dataset (#665)
- Updated URLs for EMNIST #726
- MNIST and FashionMNIST now have their own 'raw' and 'processed' folder #601
- Add metadata to some datasets (#501)
Improvements
- Allow RandomCrop to crop in the padded region #564
- ColorJitter now supports min/max values #548
- Generalize resnet to use block.extension #487
- Move area calculation out of for loop in RandomResizedCrop #641
- Add option to zero-init the residual branch in resnet (#498)
- Improve error messages in to_pil_image #673
- Added the option of converting to tensor for numpy arrays having only two dimensions in to_tensor (#686)
- Optimize _find_classes in DatasetFolder via scandir in Python3 (#559)
- Add padding_mode to RandomCrop (#489 #512)
- Make DatasetFolder more generic (#527)
- Add in-place option to normalize (#699)
- Add Hamming and Box interpolations to transforms.py (#693)
- Added the support of 2-channel Image modes such as 'LA' and adding a mode in 4 channel modes (#688)
- Improve support for 'P' image mode in pad (#683)
- Make torchvision depend on pillow-simd if already installed (#522)
- Make tests run faster (#745)
- Add support for non-square crops in RandomResizedCrop (#715)
Breaking changes
- save_images now round to nearest integer #754
Misc
- Added code coverage to travis #703
- Add downloads and docs badge to README (#702)
- Add progress to download_url #497 #524 #535
- Replace 'residual' with 'identity' in resnet.py (#679)
- Consistency changes in the models
- Refactored MNIST and CIFAR to have data and target fields #578 #594
- Update torchvision to newer versions of PyTorch
- Relax assertion in
transforms.Lambda.__init__
(#637) - Cast MNIST target to int (#605)
- Change default target type of FakedDataset to long (#581)
- Improve docs of functional transforms (#602)
- Docstring improvements
- Add is_image_file to folder_dataset (#507)
- Add deprecation warning in MNIST train[test]_labels[data] (#742)
- Mention TORCH_MODEL_ZOO in models documentation. (#624)
- Add scipy as a dependency to setup.py (#675)
- Added size information for inception v3 (#719)
New datasets, transforms and fixes
This version introduces several fixes and improvements to the previous version.
Better printing of Datasets and Transforms
- Add descriptions to Transform objects.
# Now T.Compose([T.RandomHorizontalFlip(), T.RandomCrop(224), T.ToTensor()]) prints
Compose(
RandomHorizontalFlip(p=0.5)
RandomCrop(size=(224, 224), padding=0)
ToTensor()
)
- Add descriptions to Datasets
# now torchvision.datasets.MNIST('~') prints
Dataset MNIST
Number of datapoints: 60000
Split: train
Root Location: /private/home/fmassa
Transforms (if any): None
Target Transforms (if any): None
New transforms
-
Add RandomApply, RandomChoice, RandomOrder transformations #402
- RandomApply: applies a list of transformation with a probability
- RandomChoice: choose randomly a single transformation from a list
- RandomOrder: apply transformations in a random order
-
Add random affine transformation #411
-
Add reflect, symmetric and edge padding to
transforms.pad
#460
Performance improvements
- Speedup MNIST preprocessing by a factor of 1000x
- make weight initialization optional to speed VGG construction. This makes loading pre-trained VGG models much faster
- Accelerate
transforms.adjust_gamma
by using PIL's point function instead of custom numpy-based implementation
New Datasets
- EMNIST - an extension of MNIST for hand-written letters
- OMNIGLOT - a dataset for one-shot learning, with 1623 different handwritten characters from 50 different alphabets
- Add a DatasetFolder class - generalization of ImageFolder
Miscellaneous improvements
- FakeData accepts a seed argument, so having multiple different FakeData instances is now possible
- Use consistent datatypes in Dataset targets. Now all datasets that returns labels will have them as int
- Add probability parameter in
RandomHorizontalFlip
andRandomHorizontalFlip
- Replace
np.random
byrandom
in transforms - improves reproducibility in multi-threaded environments with default arguments - Detect tif images in ImageFolder
- Add
pad_if_needed
toRandomCrop
, so that if the crop size is larger than the image, the image is automatically padded - Add support in
transforms.ToTensor
for PIL Images with mode '1'
Bugfixes
- Fix passing list of tensors to
utils.save_image
- single images passed to
make_grid
now are now also normalized - Fix PIL img close warnings
- Added missing weight initializations to densenet
- Avoid division by zero in
make_grid
when the image is constant - Fix
ToTensor
when PIL Image has mode F - Fix bug with
to_tensor
when the input is numpy array of type np.float32.
v0.2.0: New transforms + a new functional interface
This version introduced a functional interface to the transforms, allowing for joint random transformation of inputs and targets. We also introduced a few breaking changes to some datasets and transforms (see below for more details).
Transforms
We have introduced a functional interface for the torchvision transforms, available under torchvision.transforms.functional
. This now makes it possible to do joint random transformations on inputs and targets, which is especially useful in tasks like object detection, segmentation and super resolution. For example, you can now do the following:
from torchvision import transforms
import torchvision.transforms.functional as F
import random
def my_segmentation_transform(input, target):
i, j, h, w = transforms.RandomCrop.get_params(input, (100, 100))
input = F.crop(input, i, j, h, w)
target = F.crop(target, i, j, h, w)
if random.random() > 0.5:
input = F.hflip(input)
target = F.hflip(target)
F.to_tensor(input), F.to_tensor(target)
return input, target
The following transforms have also been added:
F.vflip
andRandomVerticalFlip
- FiveCrop and TenCrop
- Various color transformations:
ColorJitter
F.adjust_brightness
F.adjust_contrast
F.adjust_saturation
F.adjust_hue
LinearTransformation
for applications such as whiteningGrayscale
andRandomGrayscale
Rotate
andRandomRotation
ToPILImage
now supportsRGBA
imagesToPILImage
now accepts amode
argument so you can specify which colorspace the image should beRandomResizedCrop
now acceptsscale
andratio
ranges as input parameters
Documentation
Documentation is now auto generated and publishing to pytorch.org
Datasets:
SEMEION Dataset of handwritten digits added
Phototour dataset patches computed via multi-scale Harris corners now available by setting name
equal to notredame_harris
, yosemite_harris
or liberty_harris
in the Phototour
dataset
Bug fixes:
- Pre-trained densenet models is now CPU compatible #251
Breaking changes:
This version also introduced some breaking changes:
- The
SVHN
dataset has now been made consistent with other datasets by making the label for the digit 0 be 0, instead of 10 (as it was previously) (see #194 for more details) - the
labels
for the unlabelledSTL10
dataset is now an array filled with-1
- the order of the input args to the deprecated
Scale
transform has changed from(width, height)
to(height, width)
to be consistent with other transforms
More models and some bug fixes
- Ability to switch image backends between PIL and accimage
- Added more tests
- Various bug fixes and doc improvements
Models
- Fix for inception v3 input transform bug #144
- Added pretrained VGG models with batch norm
Datasets
- Fix indexing bug in LSUN dataset (#177)
- enable
~
to be used in dataset paths ImageFolder
now returns the same (sorted) file order on different machines (#193)
Transforms
- transforms.Scale now accepts a tuple as new size or single integer
Utils
- can now pass a pad value to make_grid and save_image
More models and datasets. Some bugfixes
New Features
Models
- SqueezeNet 1.0 and 1.1 models added, along with pre-trained weights
- Add pre-trained weights for VGG models
- Fix location of dropout in VGG
torchvision.models
now exposenum_classes
as a constructor argument- Add InceptionV3 model and pre-trained weights
- Add DenseNet models and pre-trained weights
Datasets
- Add STL10 dataset
- Add SVHN dataset
- Add PhotoTour dataset
Transforms and Utilities
transforms.Pad
now allows fill colors of either number tuples, or named colors like"white"
- add normalization options to
make_grid
andsave_image
ToTensor
now supports more input types
Performance Improvements
Bug Fixes
- ToPILImage now supports a single image
- Python3 compatibility bug fixes
ToTensor
now copes with all PIL Image types, not just RGB images- ImageFolder now only scans subdirectories.
- Having files like
.DS_Store
is now no longer a blocking hindrance - Check for non-zero number of images in ImageFolder
- Subdirectories of classes have recursive scans for images
- Having files like
- LSUN test set loads now
Just a version bump
A small release, just needed a version bump because of PyPI.
Add models and modelzoo, some bugfixes
New Features
- Add
torchvision.models
: Definitions and pre-trained models for common vision models- ResNet, AlexNet, VGG models added with downloadable pre-trained weights
- adding padding to RandomCrop. Also add
transforms.Pad
- Add MNIST dataset
Performance Fixes
- Fixing performance of LSUN Dataset
Bug Fixes
- Some Python3 fixes
- Bug fixes in save_image, add single channel support
First release
Introduced Datasets and Transforms.
Added common datasets
-
COCO (Captioning and Detection)
-
LSUN Classification
-
ImageFolder
-
Imagenet-12
-
CIFAR10 and CIFAR100
-
Added utilities for saving images from Tensors.