Skip to content

feat: Add support for CFVAE#28

Merged
zkhotanlou merged 19 commits intocharmlab:mainfrom
Chenghao-Tan:feat-CFVAE-Support
Nov 17, 2025
Merged

feat: Add support for CFVAE#28
zkhotanlou merged 19 commits intocharmlab:mainfrom
Chenghao-Tan:feat-CFVAE-Support

Conversation

@Chenghao-Tan
Copy link
Contributor

Add support for CFVAE

Reproducibility

Using the binaries the author provides, the results can be replicated in the format of their metrics. See reproduce.py. All binaries are embedded with importlib, which is backported to python<=3.8 by installing importlib_resources.

Implementation

There're plenty of details in the author's code that are different from the paper or make zero sense. Important ones are marked with # Quirk

  • In _CFVAE, strange "0.5+" in encoder()
  • Continuous features reconstruction loss is scaled with the original num range, which makes no sense and is not mentioned in the paper (removed in this implementation since the built-in datasets are normalized and have already lost this sort of information)
  • Form-loss of categorical features summing to 1, not mentioned in the paper (removed in this implementation since regrouping the onehot categories by the column name is a dumb move and form-loss for ordinal onehot categories is not defined).
  • Categorical counterfactuals are not rounded to integer in the original code (rounded in this implementation to meet the framework requirements and for fair comparison)
  • Multi-sampling VAE is not mentioned in the paper
  • Original code uses ADAM instead of SGD (used SGD as in the paper for this implementation)
  • Non-functioning operations like denormalization (removed as much as possible in this implementation)
  • Hyper-params (trying the best to keep values in the paper, but still some are missing or unclear)
  • It's an 3-in-1 implementation of ModelBasedCF, ModelApproxCF and ExampleBasedCF. Use all losses together is not banned, unlike the original code which is written separately. Users are allowed to add their own constraint_loss_func (there's an example as static method for that) and preference_dataset. Instructions of doing so is in the function comments.
    Unless marked, quirks are kept AS IS. This PR is trying the best to replicate the original code behaviours. Besides, this PR fixed tons of non-understandable/wrong variable names, added vital comments, and cleaned the code as much as possible.

Trivials

  • Apparently the biggest obstacle in the way to bumping to newer python is tensorflow 1.x->2.x compatibility. Maybe consider to sovle that. This implementation alone supports python 3.12 with no problem.
  • This PR modified setup.py and requirements-dev.txt to add tqdm dependency. And it changed torch to the corresponding CUDA-supported version in requirements-dev.txt.
  • The original benchmark framework is not very friendly to gradient-based methods. This PR added forward() for mlmodel (pytorch target model), to support gradient propagation in an easy-to-understand style. Noted that to protect the computation graph, mlmodel's device is automatically set to be the same as input x. This is an adaptive but less-efficient way, but this is probably the most elegent way before the benchmark framwork is refactored.
  • autoencoder should not be an abstraction, since not all VAE structures are the same as CCHVAE. This PR decoupled CFVAE with prebuilt autoencoder.
  • methods.processing.reconstruct_encoding_constraintsis faulty, only supports regrouping binary categorical classes. This PR avoided using this function.
  • methods.processing.check_counterfactuals is faulty, cannot support batched different negative_label. This PR avoided using this function.

@Chenghao-Tan
Copy link
Contributor Author

This PR is tested by the below script and works well:

from data.catalog import DataCatalog
from evaluation import Benchmark
import evaluation.catalog as evaluation_catalog
from models.catalog import ModelCatalog
from random import seed
from methods import CFVAE

RANDOM_SEED = 54321
seed(
    RANDOM_SEED
)  # set the random seed so that the random permutations can be reproduced again

# load a catalog dataset
data_name = "adult"
dataset = DataCatalog(data_name, "mlp", 0.8)

# load artificial neural network from catalog
model = ModelCatalog(dataset, "mlp", "pytorch")

# get factuals from the data to generate counterfactual examples
factuals = (dataset._df_train).sample(n=10, random_state=RANDOM_SEED)

# load a recourse model and pass black box model
cfvae = CFVAE(model)

# generate counterfactual examples
counterfactuals = cfvae.get_counterfactuals(factuals)

# Generate Benchmark for recourse method, model and data
benchmark = Benchmark(model, cfvae, factuals)
evaluation_measures = [
    evaluation_catalog.YNN(benchmark.mlmodel, {"y": 5, "cf_label": 1}),
    evaluation_catalog.Distance(benchmark.mlmodel),
    evaluation_catalog.SuccessRate(),
    evaluation_catalog.Redundancy(benchmark.mlmodel, {"cf_label": 1}),
    evaluation_catalog.ConstraintViolation(benchmark.mlmodel),
    evaluation_catalog.AvgTime({"time": benchmark.timer}),
]
df_benchmark = benchmark.run_benchmark(evaluation_measures)
print(df_benchmark)

@Chenghao-Tan
Copy link
Contributor Author

python -m methods.catalog.cfvae.reproduce or similiar writing is the way to run the reproduce module.

Copy link
Collaborator

@zkhotanlou zkhotanlou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please resolve the reproduction assertion to ensure the script runs as a unit test and aligns with the paper’s reported results?

@Chenghao-Tan
Copy link
Contributor Author

  • pytest is added (with support of both legacy run command and pytest).
  • results.csv is updated.

This update also includes a few behaviour changes:

  • Continuous features reconstruction loss is now scaled with the original num range. (Like in the original code)
  • Form-loss of categorical features summing to 1. (Like in the original code, and ordinal features are not affected)

Now this implementation is even more loyal to the original code.

Copy link
Collaborator

@zkhotanlou zkhotanlou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your changes the unit tests pass correctly now! could you resolve the merge conflicts with the new changes on main branch and also please run the pre-commit hooks so that both the checks get approved.

Copy link
Contributor Author

@Chenghao-Tan Chenghao-Tan Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit drops torch's GPU support since pip install -r requirements-dev.txt in pre-commit won't read -f https://download.pytorch.org/whl/torch_stable.html and therefore cannot find torch==1.7.0+cu110.

This is a backwards change. Please consider modifying .github/workflows/pre-commit.yaml later, to add -f xxx after pip install xxx, or to simply install torch+cu version manually.

@zkhotanlou
Copy link
Collaborator

This is an implementation of the "CFVAE"[1] recourse method. The level of reproduction is on level1 as the unit tests checks the implementation could reproduce results reported in the paper for adult dataset on neural network.

[1] Preserving causal constraints in counterfactual explanations for machine learning classifiers
D Mahajan, C Tan, A Sharma - arXiv preprint arXiv:1912.03277, 2019.

@zkhotanlou zkhotanlou merged commit 36c72aa into charmlab:main Nov 17, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants