Skip to content

feat: Add support for CFRL (and dataset/dependency fixes)#36

Merged
zkhotanlou merged 19 commits intocharmlab:mainfrom
Chenghao-Tan:feat--CFRL-Support
Nov 23, 2025
Merged

feat: Add support for CFRL (and dataset/dependency fixes)#36
zkhotanlou merged 19 commits intocharmlab:mainfrom
Chenghao-Tan:feat--CFRL-Support

Conversation

@Chenghao-Tan
Copy link
Contributor

@Chenghao-Tan Chenghao-Tan commented Nov 19, 2025

Add support for CFRL

Reproducibility

Level-1 reproducibility is achieved with this implementation (Random Forest on Adult), and I'm personally confident about raising it to level-2 (by adding more target models).

Implementation

This implementation is largely based on Sheldon.IO's official repo alibi, especially the RL Agent part, which locates in cfrl_*.py However, Autoencoder (based on Adult AE in the paper) part (based on Adult AE in the paper) is mostly done from scratch, which locates in model.py. The authors didn't provide seeds.

Here are the details:

  • Code used in CFRL are aggregated together into three files, for simplicity.
  • Since official dependency is a lot different than ours, tensorflow (which is easily affected by version changes) support is removed.
  • Official PyTorch support needs torch>=1.8. nn.LazyLinear is backported using nn.Linear for compatibility with torch<1.8, and input_dim is inferred from forward process since there might not be a closed-form number due to arbitrary operations in the post-process. This can work as fallback, which means, if stop passing input_dim and nn.LazyLinear support is detected, this implementation shall still use nn.LazyLinear.
  • To support the feature constraints and be more loyal to the official code, a data format adaptor was introduced. See model.py -> _ordered_to_cfrl() and _cfrl_to_ordered(). It will convert onehot+normalized data to raw data (and back). It will read metadata by running loadDataset() again (since metadata is dropped by DataCatalog). An unit test / visualizer which supports both python -m pytest methods/catalog/cfrl/dataset_adaptor_test.py (unit test) and python -m methods.catalog.cfrl.dataset_adaptor_test (visualize) is provided. Welcome to test it on different datasets and transplant it to other methods that requires raw data / metadata.
  • Hyperparameters used are the ones from the paper, not the example on Sheldon.IO's document website.

Fixes

This PR also does some general fixes for the framework's base components.

  • At some point, loadData.py was moved from /data to /data/catalog. Old cached onehot dataset pickles embedded the path of /data/loadData.py, which is not true anymore, so if it try to load a dataset using that pickle, it will fall back to load the raw data, which is not in /data anymore either (in /data/raw_data now). Then failure occurs and stop run_experiments.py. With this PR, all affected dataset loading scripts (adult, compass, credit) are fixed and all cached dataset pickles are updated.
  • run_experiments.py's comments and default cli args are now more consistent. It will by default run all supported methods, avoiding unexpected behaviours.
  • Fixed incompatibility between werkzeug, itsdangerous and Flask==1.1.2 (can be triggered by pytest run). pytest shall run out of the box now. Dependencies are also updated to make setup.py (install as a package) and requirements-dev.txt (manually install) more consistent.

Trivials

  • Note that with seed=54321 (run_experiments.py's default) and run credit dataset only with linear classifier, the classifier will predict all 1, so there's no negative case for recourse, then credit will be jumped (the current design will only do 0->1 flip). It's possibly the reason why run_experiments.py is jumping datasets. Although by re-running, it will possibly act normally, this "normal" behaviour can disturb experiment immutability. (Pseudo random numbers varies to the generation order, even when random seed is fixed.)

Copy link
Collaborator

@zkhotanlou zkhotanlou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this complete implementation and the helpful fixing issues. Just please address the minor comments remaining, otherwise it's ready to merge.

@zkhotanlou
Copy link
Collaborator

This is an implementation of the "CFRL"[1] recourse method. The level of reproduction is on level1 as the unit tests check the implementation could reproduce the results reported in the paper for Random Forest on Adult dataset.

[1] Samoilescu, R. F., Van Looveren, A., & Klaise, J. (2021). Model-agnostic and scalable counterfactual explanations via reinforcement learning. arXiv preprint arXiv:2106.02597.

@zkhotanlou zkhotanlou merged commit 7140449 into charmlab:main Nov 23, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants