Skip to content

Datasets

Adam Tupper edited this page Jan 24, 2025 · 1 revision

The following sections describe each of the tasks defined on each dataset. Except for cases where data splits are predefined by the dataset's original authors, we split each dataset into training, validation, and test images using a 7:1:2 split, using patient identifiers where applicable to ensure that there is no patient overlap between the sets.

Annotated Ultrasound Liver

The Annotated Ultrasound Liver (AUL) dataset (Xu et al., 2023) consists of 735 images, including 435 with malignant masses, 200 with benign masses, and 100 with no masses. Each image is of a different patient, with a mean width of 945.33 px ($\sigma$: 142.46 px, min: 440 px, max: 1388 px) and height of 713.80 px ($\sigma$: 94.81 px, min: 341 px, max: 910 px). With the exception of one image that is missing the outline of the liver, each image is annotated with the outline of the liver and the outline of the masses (if present). In addition, each image is labeled malignant, benign, or normal according to the presence of malignant, benign, or no masses in the image, respectively.

We define two segmentation tasks and one classification task on the AUL dataset, a liver segmentation task using the 734 images with liver annotations, a liver mass segmentation task on all 735 images, and a mass classification task to classify the images according to the type of mass present in the images. The liver segmentation task contains one less training image than the other tasks due to a missing liver segmentation mask in the source dataset.

Butterfly

The Butterfly dataset (Butterfly Network, 2018) was released for the 2018 MIT Grand Hack. It consists of ultrasound images of multiple body regions acquired using the Butterfly iQ point-of-care ultrasound device from 31 patients. The images are divided into nine groups according to the organ being imaged (morison's pouch, bladder, heart (PLAX view), heart (4-chamber view), heart (2-chamber view), IVC, carotid artery, lungs, and thyroid). We use these labels to define a nine-class image classification task. In total, the dataset consists of 41,076 images, 34,325 of which are allocated to training and validation, while 6751 are reserved for testing. The images have an average width of 415.57 px ($\sigma$: 31.16 px, min: 360 px, max: 462 px) and height of 500.80 px ($\sigma$: 36.16 px, min: 384 px, max: 512 px). We split the training and validation images using an 80:20 split into training and validation sets, ensuring that there is no patient overlap between the sets.

CAMUS

The Cardiac Acquisitions for Multi-structure Ultrasound Segmentation (CAMUS) dataset (Leclerc et al, 2018) consists of apical four-chamber and two-chamber view cardiac ultrasound sequences from 500 patients for a total of 19,232 images. The metadata provided with each image includes segmentation masks for the left ventricle endocardium, myocardium, and the left atrium. Each image is also labeled according to the quality of the scan (poor, medium, and good). The images have an average width of 597.58 px ($\sigma$: 102.80 px, min: 323 px, max: 1181 px) and an average height of 491.58 px ($\sigma$: 77.83 px, min: 292 px, max: 973 px). For the CAMUS dataset, we include two tasks: image quality classification, as was explored by Nazar et al. (2024), and cardiac structure segmentation, the original segmentation task.

Fatty Liver

The Dataset of B-mode fatty liver ultrasound images (Byra et al., 2018), referred to simply as the Fatty Liver dataset from here on, contains 550 liver ultrasound images from 55 patients, with 38 suffering from non-alcoholic fatty liver disease (NFLD; defined as >5% of hepatocytes having fatty infiltration). The images all have a resolution of $436 \times 636$ pixels. The associated binary classification task is to classify the images into normal and NFLD.

GBCU

The Gallbladder Cancer Ultrasound (GBCU) dataset (Basu et al., 2022) contains a total of 1255 annotated abdominal ultrasound images (432 normal, 558 benign, and 265 malignant) collected from 218 patients (71 normal, 100 benign, and 47 malignant). The images have an average width of 1204.95 px ($\sigma$: 85.43 px, min: 854 px, max: 1156 px) and an average height of 854.64 px ($\sigma$: 36.11 px, min: 688 px, max: 947 px). The dataset is already split into training and testing sets containing 1133 and 122 images, respectively. We further split the training set into training and validation sets using an 90:10 split. While there is no patient overlap between the training and test s ets, we cannot guarantee that there is no patient overlap between the training and validation sets since all patient information was removed before the dataset was published. The associated task is to classify the images according to the three classes.

MMOTU

The Multi-Modality Ovarian Tumor Ultrasound (MMOTU) dataset (Zhao et al., 2023) is an ovarian cancer dataset consisting of 2D ultrasound and contrast-enhanced ultrasonography (CEUS) images. In this case, we are interested only in the ultrasound images. In total there are 1469 2D ultrasound images with semantic segmentation masks identifying the tumour in each image. In addition, each image is labeled according to the presence of each type of tumour (chocolate cyst, serous cystadenoma, teratoma, theca cell tumour, simple cyst, normal ovary, mucinous cystadenoma, and high grade serous cystadenocarcinoma). This allows us to define two tasks on this dataset: binary tumour segmentation and multi-class tumour type classification. As is the case of the GBCU dataset, this dataset is pre-split into training and testing sets, with 1000 examples collected from 171 patients in the training set and 469 examples collected from 76 patients in the test set. We further split the training set into training and validation sets using an 80:20 split, but since all patient information has been removed from the dataset we cannot ensure that there is no patient overlap between the training and validation splits. The images have an average width of 550.84 px ($\sigma$: 55.38 px, min: 266 px, max: 794 px) and an average height of 762.04 px ($\sigma$: 238.06 px, min: 302 px, max: 1135 px).

Open Kidney

The Open Kidney Ultrasound dataset (Singla et al., 2023) consists of 514 B-mode kidney ultrasound images, each from a distinct patient. The images are annotated with kidney capsule pixel masks, which permits two separate semantic segmentation tasks: kidney capsule and a more fine-grain kidney regions segmentation. However, the limited amount of data relative to the complexity of the region segmentation task means that training informative, effective models in this setting is not possible. The images have an average width of 1061.92 px ($\sigma$: 200.44 px, min: 640 px, max: 1920 px) and an average height of 773.71 px ($\sigma$: 94.88 px, min: 480 px, max: 1080 px). We stratify by view (transverse, longitudinal, and other) when creating the training, validation, and test splits to minimize distribution drift between the splits.

POCUS

The Point-of-care Ultrasound (POCUS) dataset (Born et al., 2021) is a collection of convex and linear probe lung ultrasound images and videos from different sources that was created for the diagnosis of COVID-19. We use the 142 convex probe videos and 29 convex probe images distributed by the authors and follow the procedure described in their original paper to process them, sampling the videos at a rate of 3 Hz, up to a maximum of 30 frames, and grouping the frames by video to prevent data leakage between the train, validation, and test splits. In total, we extract 2726 examples. Each image is labeled by the pathology (healthy, pneumonia, covid) yielding a three-class classification problem. The images have an average width of 499.22 px ($\sigma$: 205.39 px, min: 139 px, max: 1280 px) and an average height of 462.84 px ($\sigma$: 167.07 px, min: 139 px, max: 1080 px).

PSFHS

The PSFHS dataset (Chen et al., 2024) is a dataset for fetal head and pubic symphysis segmentation, comprising 1358 images from 1124 patients. Each image is accompanied by pixel-level segmentation masks for the fetal head and pubic symphysis, supporting a three-class image segmentation task (background, pubic symphysis, and fetal head). The images all have a resolution of $256 \times 256$ px.

Stanford Thyroid

The Stanford Thyroid Ultrasound Cine-clip dataset (Stanford AIMI Center, 2021), referred to simply as the Stanford Thyroid dataset from here on, is a dataset of 192 thyroid nodule ultrasound cine-clips (videos) collected from 167 patients. The images in each sequence are associated with pixel-level nodule segmentation masks, patient demographics, lesion size and location, TI-RADS descriptors, and histopathological diagnoses. We use the nodule masks for a thyroid nodule segmentation task. In total, there are 17,412 images all with a resolution of $1054 \times 802$ px.

Clone this wiki locally