Merged
Conversation
TODO: - make compatible with Repo structure - add reproduce.
The reproduce file has two main test functions.The first test function aims to use the inner logic of the LARR method to perform an experiment similar to one done in the original paper's code. There are many differences what we have, mainly in the way the models they have produce predictions. This is important as it impacts the Lambda param is calculated. The compromise I came up with is to slightly modify the structure of the experiment in the code, not changing the actual recourse logic, to get results in their code and our code as close as possible. However, when running our own pipeline, because of the way that the models are trained, some being poorer than the ones in the paper, getting comparable recourse even when using similar datasets is a large challenge. Given the scope of our current task, since we can confirm that the logic of the recourse implementation is aligned with the original papers, We can give it a level 1-2 rating in terms of reproducibility.
This method was not working correctly before. I noticed this when running the experiment file, where the method could not pass for any of the datasets. The problem was likely the predict method that was being used inside the get lambda function, along with the methos weights not being set properly. These issues seem now to be resolved.
zkhotanlou
reviewed
Nov 24, 2025
Collaborator
zkhotanlou
left a comment
There was a problem hiding this comment.
This implementation looks solid. It checks the reproducibility of results for the same dataset and model as the original paper, and it follows the repository’s benchmarking pipeline structure correctly. After resolving the merge conflicts, the PR would ready to be merged.
flake8 was giving incorrect compare types on some files. "do not compare types. Use is, is not, or isinstance() instead."
zkhotanlou
approved these changes
Nov 24, 2025
Collaborator
zkhotanlou
left a comment
There was a problem hiding this comment.
This is an implementation of the "LARR"[1] recourse method. The level of reproduction is on level1 as the unit tests checks the implementation could reproduce results reported in the paper for German dataset on linear model.
[1] Kayastha, K., Gkatzelis, V., Jabbari, S. (2025). Learning-Augmented Robust Algorithmic Recourse. Drexel University. (https://arxiv.org/pdf/2410.01580)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The reproduce file has two main test functions. The first test function aims to use the inner logic of the LARR method to perform an experiment similar to one done in the original paper's code. There are many differences between what we have, mainly in the way the models they have produce predictions. This is important as it impacts the Lambda parameter that is calculated.
The compromise I came up with is to slightly modify the structure of the experiment in the code, not changing the actual recourse logic, to get results in their code and our code as close as possible.
However, when running our own pipeline, because of the way that the models are trained, some being poorer than the ones in the paper, getting comparable recourse even when using similar datasets is a large challenge.
Given the scope of our current task, since we can confirm that the logic of the recourse implementation is aligned with the original papers, we can give it a level 1-2 rating in terms of reproducibility.