Roar reproduce by HashirA123 · Pull Request #25 · charmlab/recourse_benchmarks

HashirA123 · 2025-10-03T18:50:27Z

I personally believe that this reproduction should be able to merge with the main.

As of now:

The results are good and in line, although not identical, with the paper. The results, in terms of key measurable metrics like validity and cost, are similar and can lead to similar logical conclusions to the paper. The main differences can be attributed the the structure of the models (ie the layers and neurons in our mlp vs theirs).
The results are consistent, as in each run of reproduction seems to be the same. This is likely the result of the big fixes by Zhara.
For a slightly simpler reproduction, I focus only on one of the datasets used in the paper, the German dataset, since the dataset is smaller, uses fewer categorical features, and has fewer features in general.

The linear model seems to be giving acceptable Validity results. The accuracy and cost of recourse are not the same, though these could be marked up to the model differences (technically the linear is a LR, so getting model accuracy the same as the paper's should be possible. Why it isn't there yet definitly has something to do with the preprocessing of the data) Roar is having trouble getting recourse for mlp (non-linear) This needs to be investigated further, given that the application of ROAR is almost the same as the paper. The only difference between Linear and non-linear model is the use of LIME. That is something to look at next.

I've decided to add a utils.py file within the ROAR method directory to hold functions that replicate those from the ModelCatalog and DataCatalog classes. This is to ensure that the reproduction script can accurately recreate the data preprocessing and model training steps as they were originally intended, rather than doing something higher-level like in the quick_start script.

Zahra/fix bug

data/catalog/_data_main/process_data/process_german_data.py

data/catalog/_data_main/process_data/process_sba_data.py

data/catalog/_data_main/raw_data/german_v2.csv

data/catalog/loadData.py

methods/catalog/roar/reproduce.py

Reproduction on ROAR with the german dataset seems to reporduce validity and cost results very similar to the original paper. However the model accuracies are much worse than expected. Techinically the LR should be functionally the same as the on in the paper. I may try using the sklearn implementation rather than the pytorch one to see if that helps.

I added a parameter "modified" to load a modified version of the SBA/german dataset and updated the relevant functions and classes to handle this new parameter. The results for ROAR seem inline with the paper in terms of validity. I would try a larger sample of factuals to be sure. Sometimes the tests fail, i need to make sure that the method runs the same every time, interms of factuals and model training. Right now there isn't consistency in that.

The linear model seems to be giving acceptable Validity results. The accuracy and cost of recourse are not the same, though these could be marked up to the model differences (technically the linear is a LR, so getting model accuracy the same as the paper's should be possible. Why it isn't there yet definitly has something to do with the preprocessing of the data) Roar is having trouble getting recourse for mlp (non-linear) This needs to be investigated further, given that the application of ROAR is almost the same as the paper. The only difference between Linear and non-linear model is the use of LIME. That is something to look at next.

I've decided to add a utils.py file within the ROAR method directory to hold functions that replicate those from the ModelCatalog and DataCatalog classes. This is to ensure that the reproduction script can accurately recreate the data preprocessing and model training steps as they were originally intended, rather than doing something higher-level like in the quick_start script.

Reproduction on ROAR with the german dataset seems to reporduce validity and cost results very similar to the original paper. However the model accuracies are much worse than expected. Techinically the LR should be functionally the same as the on in the paper. I may try using the sklearn implementation rather than the pytorch one to see if that helps.

I added a parameter "modified" to load a modified version of the SBA/german dataset and updated the relevant functions and classes to handle this new parameter. The results for ROAR seem inline with the paper in terms of validity. I would try a larger sample of factuals to be sure. Sometimes the tests fail, i need to make sure that the method runs the same every time, interms of factuals and model training. Right now there isn't consistency in that.

The ROAR method now expects the Linear model to be from sklearn for the linear case. This is to be inline with the paper implementation. I've also noticed that the results seem to be consistent now, unlike before, so the test cases should always pass.

zkhotanlou

Looks good to me and is ready to merge! Before that, could you please run your method on the possible datasets and models using run_experiments.py, and add the results to results.csv?

zkhotanlou

Also, please run the pre-commit hooks to resolve the linting and styling issues; some of the checks didn’t pass.

experiments/results.csv

This PR implements the ROAR (Robust Algorithmic Recourse) method, along with unit tests to ensure the results are reproducible as reported in the original paper [1]. The reproducibility badge level is Level 2, as the results on the German dataset have been successfully reproduced using both MLP and logistic regression models. [1] Upadhyay, S., Joshi, S., & Lakkaraju, H. (2021). Towards Robust and Reliable Algorithmic Recourse. Advances in Neural Information Processing Systems, 34, 16926–16937.

zkhotanlou and others added 4 commits September 14, 2025 14:04

fix: gravitational reproduce bug

4bb4e1d

fix: claproar flaky test

0566741

fix: github workflow updates

f58d042

adding reproduce for ROAR, some kinks seem to presists, need fixing

b0b85fb

HashirA123 force-pushed the roar-reproduce branch from e66e448 to b0b85fb Compare October 21, 2025 17:18

HashirA123 and others added 3 commits October 22, 2025 19:25

Merge pull request charmlab#23 from zkhotanlou/zahra/fix-bug

2b80eed

Zahra/fix bug

zkhotanlou reviewed Oct 25, 2025

View reviewed changes

HashirA123 added 13 commits October 25, 2025 10:32

reduced duplication in loading german data

45ae92f

Seperate asserts for linear and mlp in reproduce

e72b7df

adding reproduce for ROAR, some kinks seem to presists, need fixing

8c28db3

reduced duplication in loading german data

4ac5a78

Seperate asserts for linear and mlp in reproduce

294ae01

resolved merge conflicts

1857f39

HashirA123 marked this pull request as ready for review October 29, 2025 01:32

HashirA123 requested a review from zkhotanlou October 29, 2025 01:32

Fix formating and imports

451acdf

zkhotanlou reviewed Nov 2, 2025

View reviewed changes

HashirA123 added 2 commits November 2, 2025 19:16

added results to results.csv

7a459ef

Ran precommit hooks

7417a2e

zkhotanlou reviewed Nov 3, 2025

View reviewed changes

experiments/results.csv Show resolved Hide resolved

HashirA123 requested a review from zkhotanlou November 3, 2025 14:42

zkhotanlou merged commit aaf9232 into charmlab:main Nov 5, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roar reproduce#25

Roar reproduce#25
zkhotanlou merged 23 commits intocharmlab:mainfrom
HashirA123:roar-reproduce

HashirA123 commented Oct 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zkhotanlou left a comment

Uh oh!

zkhotanlou left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HashirA123 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zkhotanlou left a comment

Choose a reason for hiding this comment

Uh oh!

zkhotanlou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HashirA123 commented Oct 3, 2025 •

edited

Loading