Datasets

This page includes a list of curated publicly available datasets for machine learning and computer graphics research projects.

Academic Torrents - A Distributed Dataset Collection

http://academictorrents.com/

Hosting and distributing large datasets is cumbersome and expensive in storage and bandwidth costs. The torrent protocol for file transfer (while unfortunately synonymous with internet piracy and theft) actually offers a perfect solution to the problem of hosting and allowing access to extremely large datasets.

Once you have downloaded a dataset if you have the means and capacity you should seed for a reasonable amount of time to allow others to benefit from your having downloaded the data.

Here are just a couple of the interesting datasets hosted there for download.

Dataset	Description	Link
Vincent van Gogh Paintings	High resolution scans of Vincent van Gogh Paintings.	Link
Arizona State University Twitter Data Set	A large dataset of twitter users, their friend and follower relationships, and connectivities based on tweets.	Link
Mars Terrain Height Data	`46080x23040` high resolution mosaic of Martian terrain height data collected by the Mars Orbiter Laser Altimeter	Link
IARPA Functional Map of the World	`2.69TB` of high resolution satellite imagery covering the entire globe.	Link

Toy Image Datasets

Small scale image datasets useful for prototyping and experimentation. Datasets are small enough to fit in RAM without causing issues, even on small formfactor embedded and mobile devices, and on low power laptops. Useful for small scale image synthesis and classification.

Dataset	Description	Link
MNIST	Labelled `28x28` grayscale handwritten digits in `0-9` from `10` classes.	http://yann.lecun.com/exdb/mnist/
notMNIST	Labeled `28x28` grayscale images of letters in the range `A-J` from `10` classes sampled from publicly available fonts.	http://yaroslavvb.com/upload/notMNIST/
Fashion MNIST	Labelled `28x28` grayscale images of clothing items.	https://github.com/zalandoresearch/fashion-mnist
CIFAR-10 and CIFAR-100	`60,000` labelled `32x32` RGB images from `10` or `100` classes, respectively.	https://www.cs.toronto.edu/~kriz/cifar.html

Image Datasets

Larger scale image datasets which are too big to fit in RAM. Loading data efficiently during training and inference will be vital when working with these. Useful for image synthesis, classification, and segmentation.

Dataset	Description	Link
SVHN	`~600,000` labelled `32x32` RGB images of Google Street View House Numbers.	http://ufldl.stanford.edu/housenumbers/
ImageNet	`~14 million` labelled RGB images with an average resolution of `469x387` from `1000` classes. Human readable class names.	http://www.image-net.org/
Cityscapes	High resolution RGB images of drivers point of view within `50` different cities. `5000` images with fine segmentation masks, `20,000` with coarse segmentation masks.	https://www.cityscapes-dataset.com/
CelebA	`~200,000` annotated `178x218` RGB images of celebrity faces, cropped and aligned consistently. Licensing and usage rights on this data is ambiguous or non-existent.	http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
FFHQ	`70,000` RGB images of human faces at `1024x1024` resolution. All images are under a permissive license. This dataset is the logical successor to CelebA due to quality and licensing.	https://github.com/NVlabs/ffhq-dataset

Mesh Datasets

Large scale dataset of 3D models of various common object types. Useful for mesh synthesis, classification, and segmentation.

Dataset	Description	Link
ShapeNetCore	`51,300` 3D meshes labelled by `55` object categories.	https://www.shapenet.org/
ShapeNetSem	`12,000` 3D meshes, sampled from ShapeNetCore, semantically labelled per triangle face into `270` component categories.	https://www.shapenet.org/

Mesh Datasets for Rendering Applications

Small scale but high quality (polycount) datasets of 3D models useful for testing rendering applications. The classic Utah Teapot, Stanford Bunny, Stanford Dragon, Buddha, and Armadillo meshes can be found in these repositories.

Dataset	Description	Link
The Stanford 3D Scanning Repository	High resolution 3D meshes created via laser scanning of figurines. Meshes are raw from scan and may require clean up! Rendering a mesh with holes using a glass shader will allow light to leak out of the mesh without scattering properly.	http://graphics.stanford.edu/data/3Dscanrep/
McGuire Computer Graphics Archive	High quality curated models and scenes with texture and bump maps.	http://casual-effects.com/data/
Keenan's 3D Model Repository (Carnegie Mellon University)	Nice toy meshes and the occasional high resolution laser scan.	https://www.cs.cmu.edu/~kmcrane/Projects/ModelRepository/
Large Geometric Models Archive (Georgia Tech)	More high resolution scans.	https://www.cc.gatech.edu/projects/large_models/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets.md

Datasets.md

Datasets

Academic Torrents - A Distributed Dataset Collection

Here are just a couple of the interesting datasets hosted there for download.

Toy Image Datasets

Image Datasets

Mesh Datasets

Mesh Datasets for Rendering Applications

Files

Datasets.md

Latest commit

History

Datasets.md

File metadata and controls

Datasets

Academic Torrents - A Distributed Dataset Collection

Here are just a couple of the interesting datasets hosted there for download.

Toy Image Datasets

Image Datasets

Mesh Datasets

Mesh Datasets for Rendering Applications