This bundle contains the manuscript submitted at PGM-2026 and entitled "Accounting for Data Uncertainty in Counterfactual Reasoning". The organisation is the following:
- examples: a toy example for running the method proposed in the paper.
- ctfzeros: python sources implementing the method.
- models: set of structural causal models in UAI format considered in the experimentation.
- requirements.txt: code dependencies.
First of all, check the Python version. This sources have been coded with the following Python version:
!python --versionPython 3.11.13
Then, install the dependencies in the requirement.txt file. The main dependency is the python packege bcause (https://github.com/PGM-Lab/bcause).
!pip install --upgrade pip setuptools wheel
!pip install -r ./requirements.txt
!pip install polytope~=0.2.5 --no-depsIn this repository, we provide functionality for preprocessing the model and data so they could work we our inference algorithm:
from ctfzeros.prepro import load_and_preprocessfilepath = "./models/simple_nparents1_nzr10_zdr00_0.uai"
datapath = "./models/simple_nparents1_nzr10_zdr00_0.csv"
model, data, _, _ = load_and_preprocess(filepath, datapath)
model<StructuralCausalModel (Y:2,X1:2|Uy:4,Ux1:2), dag=[Uy][Y|Uy:X1][X1|Ux1][Ux1]>
model.draw()data.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| X1 | Y | |
|---|---|---|
| 0 | 0 | 1 |
| 1 | 0 | 1 |
| 2 | 0 | 0 |
| 3 | 0 | 1 |
| 4 | 1 | 1 |
| ... | ... | ... |
| 995 | 0 | 0 |
| 996 | 1 | 0 |
| 997 | 0 | 0 |
| 998 | 0 | 1 |
| 999 | 1 | 0 |
1000 rows × 2 columns
First, load corresponding modules for using LPID and DC3ID:
from ctfzeros.imprecise_empirical import LPCC_imprecise_empirical, DCCC_imprecise_empiricalSet up the LPID inference engine with a perturbation
infLPID = LPCC_imprecise_empirical(model, data, perturbation=0.05)
infLPID.prob_sufficiency("X1", "Y")[7.724454227344187e-20, 0.4317658471578703]
Similarly, with the divide and conquer approach (DC3ID):
infDC3ID = DCCC_imprecise_empirical(model, data, perturbation=0.05)
infDC3ID.prob_sufficiency("X1", "Y")[0.0, 0.4317658471578494]
Instead of the interval, we can obtain the list of individual queries:
infDC3ID.set_interval_result(False)
infDC3ID.prob_sufficiency("X1", "Y")[0.3702430867734918,
0.2277520565325892,
0.4317658471578494,
0.2655972876838508,
0.0,
0.0,
0.0,
0.0]
Finally, we can do the inference with a reduced number of solutions (num_runs=5) which can lead to an approximation:
infDC3ID = DCCC_imprecise_empirical(model, data, perturbation=0.05, num_runs=5)
infDC3ID.set_interval_result(False)
infDC3ID.prob_sufficiency("X1", "Y")[0.22775205653259195,
0.2655972876838502,
0.3702430867734911,
0.4317658471578494,
0.0]
