NPW: Measuring Model Performance in the Presence of an Intervention

🔍 Overview

Real-world AI models are often used in the presence of an intervention (e.g., hospitals use AI to predict patients' risk of readmission and apply post-discharge phone check-in as an intervention to reduce readmission risk at the same time).
Evaluating a model's ability to predict outcome without intervention often requires randomized controlled trial (RCT) data, which is expensive to collect.
Standard evaluation can only use control group data in an RCT, while naïvely using both treatment and control group data leads to biased evaluation.
We proposed Nuisance Parameter Weighting (NPW), an unbiased model evaluation approach that uses all RCT Data.
We validated that NPW improves AUROC estimation, model ranking, and model selection across wide ranges of synthetic and real-world datasets.
This repository contains reproducible code for experimental results in "Measuring Model Performance in the Presence of an Intervention" (AAAI 2026).

▶️ Quick Start

1: obtain required datasets

Synthetic datasets will be automatically generated in the following steps
AMR-UTI dataset needs to be obtained from PhysioNet, and place the all_prescriptions.csv, all_uti_features.csv, and all_uti_resist_labels.csv files in data/AMR-UTI/raw folder.
The readmission dataset was collected at Michigan Medicine and is not publicly available to protect patient privacy. However, we provide the checkpoints of the model evaluation results for analysis. If you are interested in working with this dataset, please contact the authors for more information.

2: Install required packages

bash pip install -r requirements.txt

3: Reproducing Synthetic Experiments:

Path	Description
`src/run_gen_sim.sh`	Generate Synthetic Datasets
`src/run_eval_models_sim.sh`	Run model evaluation with standard/naïve/NPW

3: Reproducing AMR-UTI Experiments:

Path	Description
`src/run_gen_amruti.sh`	Preprocess AMR-UTI Datasets
`src/run_eval_models_amruti.sh`	Run model evaluation with standard/naïve/NPW

4 Visualizing Results

Path	Description
`notebooks/plot_sim_eval_results.ipynb`	Visualize Synthetic Experiment Results
`notebooks/plot_real_eval_results.ipynb`	Visualize AMR-UTI & Readmission Experiment Results

📝 Citation

If you use NPW in your research, please cite:

Chen W., Sjoding M., Wiens J., Measuring Model Performance in the Presence of an Intervention. The 40th Annual AAAI Conference on Artificial Intelligence (2026).

🛠️ License

This project is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
checkpoints		checkpoints
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
overview.jpg		overview.jpg
requirementx.txt		requirementx.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NPW: Measuring Model Performance in the Presence of an Intervention

🔍 Overview

▶️ Quick Start

1: obtain required datasets

2: Install required packages

3: Reproducing Synthetic Experiments:

3: Reproducing AMR-UTI Experiments:

4 Visualizing Results

📝 Citation

🛠️ License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NPW: Measuring Model Performance in the Presence of an Intervention

🔍 Overview

▶️ Quick Start

1: obtain required datasets

2: Install required packages

3: Reproducing Synthetic Experiments:

3: Reproducing AMR-UTI Experiments:

4 Visualizing Results

📝 Citation

🛠️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages