DataCard

Autonomous Evaluation of Whole Slide Imaging (WSI) and Radiology Imaging Dataset Quality, a suite of software tools for comprehensive dataset evaluation and summarization with respect to correctness, completeness, coverage, and consistency

Overview

DataCard examines submitted images for artifacts and provides a scorecard with descriptive and quantitative measures of dataset quality including image quality, metadata, and annotation quality for both mammography and histopathology whole slide images.

DataCard Generation

An outline of the DataCard pipeline is given below. Patient data from the provider, along with metadata and dataset information, is processed by data quality modules across several dimensions to produce the various output components of the DataCard including visualizations and reports.

DataCard Dimensions

DataCard is based on 4C dimensions described below:

There are 4 main dimensions in the DataCard

Correctness: Correctness refers to the accuracy of the recorded data, including the medical images, the disease labels, and patient metadata.
Completeness: Completeness refers to the proportion of samples that contains all associated data and metadata values.
Coverage: Coverage refers to the amount and variety of data along different factors of variation.
Consistency: Consistency refers to the uniformity of data quality across different subgroups and collection sites.

Individual tools developed to assess patient datasets across different dimensions and produce the various components of the DataCard are outlined below:

Correctness Assessment

Digital Pathology

HistoART is a Python-based toolkit designed for researchers and professionals working with digital pathology images. Its primary goal is to facilitate effective detection, classification, and quantification of artifacts in a histopathological slide dataset through both deep learning and hand-crafted feature approaches. This toolkit is particularly aimed at pathologists, data scientists, and students looking to streamline and enhance their analysis workflows.

Repository link: HistoART

Digital Mammography

Correctness DM is an in-development Python-based toolkit designed to automatically identify common image quality issues and artifacts in Digital Mammography images. This code will be available once development is completed.

Repository link: Correctness DM

Completeness, Coverage, and Consistency Assessment

M3C is a comprehensive tool for assessing the Completeness, Coverage, and Consistency of the metadata in patient datasets. The tool produces metadata features from patient metadata information as well as visualizations of available and missing data fields and distributions of values within individual fields.

Repository link: M3C

An outline of the metadata assessment pipeline is given below.

How to Cite

If you utilize DataCard in your research or applications, please cite the repository:

@misc{DataCard2025,
  author = {Seyed M. Kahaki, Alexander R. Webber, Tahsin Rahman},
  title = {DataCard},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/DIDSR/DataCard}},
}

Files and Data

All code and data files required for this project, along with instructions to acquire any necessary external data, are provided within the repositories for the different dimensions described above.

Contact and Contributions

Seyed Kahaki: [email protected]

Tahsin Rahman: [email protected]

Acknowledgements

This project was supported in part by an appointment to the ORISE Research Participation Program at the Center for Devices and Radiological Health, U.S. Food and Drug Administration, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and FDA/CDRH.

Disclaimer

This software and documentation (the "Software") were developed at the US Food and Drug Administration (FDA) by employees of the Federal Government in the course of their official duties. Pursuant to Title 17, Section 105 of the United States Code, this work is not subject to copyright protection and is in the public domain. Permission is hereby granted, free of charge, to any person obtaining a copy of the Software, to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, or sell copies of the Software or derivatives, and to permit persons to whom the Software is furnished to do so. FDA assumes no responsibility whatsoever for use by other parties of the Software, its source code, documentation or compiled executables, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic. Further, use of this code in no way implies endorsement by the FDA or confers any advantage in regulatory decisions. Although this software can be redistributed and/or modified freely, we ask that any derivative works bear some notice that they are derived from it, and any modified versions bear some notice that they have been modified.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
imgs		imgs
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

DataCard

Overview

DataCard Generation

DataCard Dimensions

Correctness Assessment

Digital Pathology

Digital Mammography

Completeness, Coverage, and Consistency Assessment

How to Cite

Files and Data

Contact and Contributions

Acknowledgements

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Uh oh!

License

Uh oh!

DIDSR/DataCard

Folders and files

Latest commit

History

Repository files navigation

DataCard

Overview

DataCard Generation

DataCard Dimensions

Correctness Assessment

Digital Pathology

Digital Mammography

Completeness, Coverage, and Consistency Assessment

How to Cite

Files and Data

Contact and Contributions

Acknowledgements

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages