-
Real-world AI models are often used in the presence of an intervention (e.g., hospitals use AI to predict patients' risk of readmission and apply post-discharge phone check-in as an intervention to reduce readmission risk at the same time).
-
Evaluating a model's ability to predict outcome without intervention often requires randomized controlled trial (RCT) data, which is expensive to collect.
-
Standard evaluation can only use control group data in an RCT, while naïvely using both treatment and control group data leads to biased evaluation.
-
We proposed Nuisance Parameter Weighting (NPW), an unbiased model evaluation approach that uses all RCT Data.
-
We validated that NPW improves AUROC estimation, model ranking, and model selection across wide ranges of synthetic and real-world datasets.
-
This repository contains reproducible code for experimental results in "Measuring Model Performance in the Presence of an Intervention" (AAAI 2026).
- Synthetic datasets will be automatically generated in the following steps
- AMR-UTI dataset needs to be obtained from PhysioNet, and place the
all_prescriptions.csv,all_uti_features.csv, andall_uti_resist_labels.csvfiles indata/AMR-UTI/rawfolder. - The readmission dataset was collected at Michigan Medicine and is not publicly available to protect patient privacy. However, we provide the checkpoints of the model evaluation results for analysis. If you are interested in working with this dataset, please contact the authors for more information.
bash pip install -r requirements.txt
| Path | Description |
|---|---|
src/run_gen_sim.sh |
Generate Synthetic Datasets |
src/run_eval_models_sim.sh |
Run model evaluation with standard/naïve/NPW |
| Path | Description |
|---|---|
src/run_gen_amruti.sh |
Preprocess AMR-UTI Datasets |
src/run_eval_models_amruti.sh |
Run model evaluation with standard/naïve/NPW |
| Path | Description |
|---|---|
notebooks/plot_sim_eval_results.ipynb |
Visualize Synthetic Experiment Results |
notebooks/plot_real_eval_results.ipynb |
Visualize AMR-UTI & Readmission Experiment Results |
If you use NPW in your research, please cite:
Chen W., Sjoding M., Wiens J., Measuring Model Performance in the Presence of an Intervention. The 40th Annual AAAI Conference on Artificial Intelligence (2026).
This project is licensed under the Apache 2.0 License.
