Merged
Conversation
e66e448 to
b0b85fb
Compare
The linear model seems to be giving acceptable Validity results. The accuracy and cost of recourse are not the same, though these could be marked up to the model differences (technically the linear is a LR, so getting model accuracy the same as the paper's should be possible. Why it isn't there yet definitly has something to do with the preprocessing of the data) Roar is having trouble getting recourse for mlp (non-linear) This needs to be investigated further, given that the application of ROAR is almost the same as the paper. The only difference between Linear and non-linear model is the use of LIME. That is something to look at next.
I've decided to add a utils.py file within the ROAR method directory to hold functions that replicate those from the ModelCatalog and DataCatalog classes. This is to ensure that the reproduction script can accurately recreate the data preprocessing and model training steps as they were originally intended, rather than doing something higher-level like in the quick_start script.
Zahra/fix bug
zkhotanlou
reviewed
Oct 25, 2025
Reproduction on ROAR with the german dataset seems to reporduce validity and cost results very similar to the original paper. However the model accuracies are much worse than expected. Techinically the LR should be functionally the same as the on in the paper. I may try using the sklearn implementation rather than the pytorch one to see if that helps.
I added a parameter "modified" to load a modified version of the SBA/german dataset and updated the relevant functions and classes to handle this new parameter. The results for ROAR seem inline with the paper in terms of validity. I would try a larger sample of factuals to be sure. Sometimes the tests fail, i need to make sure that the method runs the same every time, interms of factuals and model training. Right now there isn't consistency in that.
The linear model seems to be giving acceptable Validity results. The accuracy and cost of recourse are not the same, though these could be marked up to the model differences (technically the linear is a LR, so getting model accuracy the same as the paper's should be possible. Why it isn't there yet definitly has something to do with the preprocessing of the data) Roar is having trouble getting recourse for mlp (non-linear) This needs to be investigated further, given that the application of ROAR is almost the same as the paper. The only difference between Linear and non-linear model is the use of LIME. That is something to look at next.
I've decided to add a utils.py file within the ROAR method directory to hold functions that replicate those from the ModelCatalog and DataCatalog classes. This is to ensure that the reproduction script can accurately recreate the data preprocessing and model training steps as they were originally intended, rather than doing something higher-level like in the quick_start script.
Reproduction on ROAR with the german dataset seems to reporduce validity and cost results very similar to the original paper. However the model accuracies are much worse than expected. Techinically the LR should be functionally the same as the on in the paper. I may try using the sklearn implementation rather than the pytorch one to see if that helps.
I added a parameter "modified" to load a modified version of the SBA/german dataset and updated the relevant functions and classes to handle this new parameter. The results for ROAR seem inline with the paper in terms of validity. I would try a larger sample of factuals to be sure. Sometimes the tests fail, i need to make sure that the method runs the same every time, interms of factuals and model training. Right now there isn't consistency in that.
The ROAR method now expects the Linear model to be from sklearn for the linear case. This is to be inline with the paper implementation. I've also noticed that the results seem to be consistent now, unlike before, so the test cases should always pass.
zkhotanlou
reviewed
Nov 2, 2025
Collaborator
zkhotanlou
left a comment
There was a problem hiding this comment.
Looks good to me and is ready to merge! Before that, could you please run your method on the possible datasets and models using run_experiments.py, and add the results to results.csv?
zkhotanlou
reviewed
Nov 2, 2025
Collaborator
zkhotanlou
left a comment
There was a problem hiding this comment.
Also, please run the pre-commit hooks to resolve the linting and styling issues; some of the checks didn’t pass.
zkhotanlou
reviewed
Nov 3, 2025
Jamie001129
pushed a commit
to Jamie001129/recourse_benchmarks
that referenced
this pull request
Nov 16, 2025
This PR implements the ROAR (Robust Algorithmic Recourse) method, along with unit tests to ensure the results are reproducible as reported in the original paper [1]. The reproducibility badge level is Level 2, as the results on the German dataset have been successfully reproduced using both MLP and logistic regression models. [1] Upadhyay, S., Joshi, S., & Lakkaraju, H. (2021). Towards Robust and Reliable Algorithmic Recourse. Advances in Neural Information Processing Systems, 34, 16926–16937.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I personally believe that this reproduction should be able to merge with the main.
As of now: