The following datasets can be loaded with the current codes after downloaded (see example scripts):
| FR Dataset | Description | NR Dataset | Description |
|---|---|---|---|
| PIPAL | 2AFC | FLIVE(PaQ-2-PiQ) | Tech & Aesthetic |
| BAPPS | 2AFC | SPAQ | Mobile |
| PieAPP | 2AFC | AVA | Aesthetic |
| KADID-10k | KonIQ-10k(++) | ||
| LIVEM | LIVEChallenge | ||
| LIVE | PIQ2023 | Portrait dataset | |
| TID2013 | GFIQA | Face IQA Dataset | |
| TID2008 | |||
| CSIQ |
Please see more details at Awesome Image Quality Assessment
Here are some other resources to download the dataset:
We create general interfaces for FR and NR datasets in pyiqa/data/general_fr_dataset.py and pyiqa/data/general_nr_dataset.py. The main arguments are
optcontains all dataset options, includingdataroot_target: path of target image folder.dataroot_ref [optional]: path of reference image folder.meta_info_file: file containing meta information of images, including relative image paths, mos labels and other labels.augment [optional]data augmentation transform listhflip: flip input images or pairsrandom_crop: int or tuple, random crop input images or pairs
split_file [optional]:train/val/testsplit file*.pkl. If not specified, will use the split information in meta csv file or load the whole dataset.split_index [optional]:strorint, which split to use, valid whensplit_fileis specified or corresponding split information exits in meta csv file.dmos max: some dataset use difference of mos. Set this to non-zero will change dmos to mos withmos = dmos_max - dmos.phase: phase labels [train, val, test]
The above interface requires the meta_info_file to provide the dataset information and the train/val/test split. The meta_info_file are .csv files, and has the following general format
- For NR datasets: name, mos(mean), std, split_name
```
100.bmp 32.56107532210109 19.12472638223644 train/val/test
```
- For FR datasets: ref_name, dist_name, mos(mean), std, split_name
```
I01.bmp I01_01_1.bmp 5.51429 0.13013 train/val/test
```
Note that we generate train/val/test splits follow the principles below:
- For datasets which has official splits, we follow their splits.
- For official split which has no
valpart, e.g., AVA dataset, we random separate 5% from training data as validation. - For small datasets which requires n-split results, we use
train:val=8:2ratio. - All random seeds are set to
123when needed.
According to these rules, the split_name is named as follows:
- The official split is saved in a column named
official_split. - [if necessary] Ten random splits are generated and stored using the format
ratio[split_ratio]_seed[seed number]_split[split index:02d]. For example, for a split ratio oftrain/val/test=8:0:2, a seed number of 123, and the first split, the entry would beratio802_seed123_split01. - You can also use other custom split names, such as the
ILGnet_splitfor the AVA dataset.
You may also use the split_file to specify the split information. The split_file are .pkl files which contains the train/val/test information with python dictionary in the following format:
{
train_index: {
train: [train_index_list]
val: [val_index_list] # blank if no validation split
test: [test_index_list] # blank if no test split
}
}
The train_index starts from 1. And the sample indexes correspond to the row index of meta_info_file, starting from 0. We already generate the files for mainstream public datasets with scripts in folder ./scripts/.
Some of the supported datasets have different label formats and file organizations, and we create specific dataloader for them:
- Live Challenge. The first 7 samples are usually removed in the related works.
- AVA. Different label formats.
- PieAPP. Different label formats.
- BAPPS. Different label formats.
You may use tests/test_datasets.py to test whether a dataset can be correctly loaded.