algo-benchmarks: Conditional Independence (CI) Tests Benchmark

Overview

This benchmark framework evaluates the performance of various Conditional Independence (CI) tests, primarily those in pgmpy. CI tests may perform differently depending on the data-generating mechanism (DGM), sample size, variable types, effect size, and the complexity of the conditioning set. algo-benchmarks helps users and developers:

Compare CI tests under standardized, reproducible settings.
Select the best CI test for their data and use case.
Contribute new tests or data-generating mechanisms for further comparison.

Benchmark results are saved to CSV files, which can be used to generate plots or for further analysis.

Setup Instructions

Prerequisites

Python 3.8+
Clone this repository (algo-benchmarks)
Install pgmpy (either latest release or in editable mode)

Installation

Install required dependencies:

pip install -r requirements.txt

If you want to use a development version of pgmpy:

git clone https://github.com/pgmpy/pgmpy.git
cd pgmpy
pip install -e .[tests]

Return to your algo-benchmarks directory before running benchmarks.

Usage

Running the Benchmark

From the root directory of algo-benchmarks, run:

python -m PY_Scripts.CI_Benchmarks

This will:

Run each CI test on each DGM for various sample sizes, conditioning set sizes, and effect sizes.
Output detailed and summary CSV files (ci_benchmark_raw_result.csv, ci_benchmark_summaries.csv).

Custom Data-Generating Mechanisms

To add your own DGM:

Define a function in PY_Scripts/data_generating_mechanisms.py:

def my_custom_dgm(n_samples, effect_size=1.0, n_cond_vars=1, seed=None, dependent=True):
    # return a pandas.DataFrame with columns like ['X', 'Y', 'Z1', ...]

Register it in the DGM registry in that file:
```
DGP_REGISTRY["my_custom"] = my_custom_dgm
```
Add "my_custom" to the DGM_TO_CITESTS mapping in CI_Benchmarks.py if you want it benchmarked.

Understanding the Output

You get two main files after running the benchmark:

ci_benchmark_raw_result.csv: All individual benchmark runs.
ci_benchmark_summaries.csv: Aggregated summary statistics.

Output Columns

`ci_benchmark_raw_result.csv`

Column	Description
dgm	Data Generating Mechanism used
sample_size	Number of samples
n_cond_vars	Number of conditioning variables
effect_size	Numeric effect size (0 = null, >0 = alt)
repeat	Repetition index
ci_test	CI test used (e.g., pearsonr, gcm, pillai)
dependent	`True` if X and Y are dependent, `False` otherwise
p_value	The test's p-value

`ci_benchmark_summaries.csv`

Column	Description
dgm	Data Generating Mechanism used
sample_size	Number of samples
n_cond_vars	Number of conditioning variables
effect_size	Effect size
ci_test	CI test used
significance_level	Significance threshold used
type1_error	False positive rate
type2_error	False negative rate
power	1 - type2_error
N_null	Number of null runs
N_alt	Number of alt runs

Customizing the Benchmark

Adding a New DGM

Edit benchmarks/DGM.py and define your function.
Add it to the DGP_REGISTRY dictionary.
Optionally, add it to the DGM_TO_CITESTS mapping in ci_benchmarks.py.

Adding a New CI Test

Implement your test as a function (compatible with pgmpy’s CI test callable signature).
Register it in the ci_tests dictionary in CI_Benchmarks.py.
Add it to the list for relevant DGMs in DGM_TO_CITESTS.

Plotting and Visualization

You can create plots from the summary CSV using pandas, matplotlib, or seaborn.
See the plotting functions in CI_Benchmarks.py for example usage.

Contribution Guidelines

Please add tests for any new functionality.
Follow the code style used in pgmpy and this repo.
Document any new DGMs or CI tests in this file.

References

pgmpy documentation
pgmpy/pgmpy#2150
Relevant academic papers as needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algo-benchmarks: Conditional Independence (CI) Tests Benchmark

Overview

Setup Instructions

Prerequisites

Installation

Usage

Running the Benchmark

Custom Data-Generating Mechanisms

Understanding the Output

Output Columns

`ci_benchmark_raw_result.csv`

`ci_benchmark_summaries.csv`

Customizing the Benchmark

Adding a New DGM

Adding a New CI Test

Plotting and Visualization

Contribution Guidelines

References

FilesExpand file tree

BenchmarkDOC.md

Latest commit

History

BenchmarkDOC.md

File metadata and controls

algo-benchmarks: Conditional Independence (CI) Tests Benchmark

Overview

Setup Instructions

Prerequisites

Installation

Usage

Running the Benchmark

Custom Data-Generating Mechanisms

Understanding the Output

Output Columns

ci_benchmark_raw_result.csv

ci_benchmark_summaries.csv

Customizing the Benchmark

Adding a New DGM

Adding a New CI Test

Plotting and Visualization

Contribution Guidelines

References

`ci_benchmark_raw_result.csv`

`ci_benchmark_summaries.csv`