This repository contains the code and methodologies to reproduce the results presented in the paper: A Knowledge-Based Framework for Generating Synthetic Intracranial Hemorrhage CT Data to Assess AI Generalizability preprint link.
Deep learning models for Computer-Assisted Detection (CAD) of intracranial hemorrhage (ICH) often struggle with generalizability when encountering CT data with characteristics underrepresented in their training sets (e.g., variations in patient demographics, hemorrhage types, or image acquisition parameters).
This project introduces an open-source framework to:
- Generate synthetic ICH CT data by inserting realistic, modeled hemorrhages (epidural, subdural, intraparenchymal) into a digital head phantom.
- Simulate mass effect and control hemorrhage volume and attenuation based on real data distributions.
- Create datasets with varied CT acquisition parameters (mAs, kVp) to robustly evaluate the generalizability of ICH detection models.
Our work validates this approach by demonstrating comparable performance of an ICH detection model on our synthetic dataset (AUC 0.877) versus an independent real dataset (AUC 0.919). This framework enables more comprehensive testing and evaluation of CAD devices for ICH.
Synthetic ICH datasets were generated using InSilicoICH.
Note: Figures 3, 8 require the following dataset of real CT examples with and without ICH to be downloaded here. To reproduce these figures, place the downloaded contents inside datasets/computed-tomography-images-for-intracranial... prior to regenerating figures.
git clone https://github.com/DIDSR/synthetic-ich-for-cad-evaluations.git
cd synthetic-ich-for-cad-evaluationsconda create -n synthetic-ich-for-cad-evaluations python==3.11.* -y
conda activate synthetic-ich-for-cad-evaluations
pip install -r requirements.txtDefault is in your working directory in ./datasets
echo BASE_DIR=./datasets >> .envYou are now ready to run the notebooks and regenerate figures.
You can run notebooks individually or all together with the run_all script:
bash run_all.shThis will take longer to run the first time as missing datasets will be downloaded to BASE_DIR, (4.5 GB for notebook fig3-5_view_six_examples and (480 MB for notebook fig6_kV_mA_variation) subsequent runs should take around 25-30 s to complete.
Additional notebooks are included that were not directly related to generating manuscript figures but can be helpful for visualizing results or running the pipeline end to end (see single_case_pipeline).
The pretrained ICH detection model used in generating figure 5 can be downloaded directly here: Pretrained model weights, with retraining methods and details available here: RSNA 2019 ICH Detection Grand Challenge Fork.
This software and documentation (the "Software") were developed at the US Food and Drug Administration (FDA) by employees of the Federal Government in the course of their official duties. Pursuant to Title 17, Section 105 of the United States Code, this work is not subject to copyright protection and is in the public domain. Permission is hereby granted, free of charge, to any person obtaining a copy of the Software, to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, or sell copies of the Software or derivatives, and to permit persons to whom the Software is furnished to do so. FDA assumes no responsibility whatsoever for use by other parties of the Software, its source code, documentation or compiled executables, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic. Further, use of this code in no way implies endorsement by the FDA or confers any advantage in regulatory decisions. Although this software can be redistributed and/or modified freely, we ask that any derivative works bear some notice that they are derived from it, and any modified versions bear some notice that they have been modified.



