Merged
Conversation
Collaborator
|
That's a good implementation. Please add the reproduce tests as well. |
amirhk
requested changes
Oct 20, 2025
Collaborator
amirhk
left a comment
There was a problem hiding this comment.
some high-level feedback/requests
cc @zkhotanlou for detailed implementation review
@HashirA123 agree with Zahra that next steps is reproduce.py
zkhotanlou
reviewed
Oct 20, 2025
Initially had the PROBE model commits inside this branch. They have now been moved to their own branch, and this branch now only contains the RBR model commits.
(WIP) since the method is implemented, the main thing left is making the data processing and experiment process the same as the original paper. They fo use the same datasets but process thems slightly different (like using different features). Their model (mlp) is also different. So running with the dataset and model as they are (the way we have them) will definitly not get the same results. Will have to think of how to best combine with their format. One option is not using our model and data catalogs to load the data, and simply port over their model and data creation/processing code.
zkhotanlou
reviewed
Oct 26, 2025
Initially had the PROBE model commits inside this branch. They have now been moved to their own branch, and this branch now only contains the RBR model commits.
(WIP) since the method is implemented, the main thing left is making the data processing and experiment process the same as the original paper. They fo use the same datasets but process thems slightly different (like using different features). Their model (mlp) is also different. So running with the dataset and model as they are (the way we have them) will definitly not get the same results. Will have to think of how to best combine with their format. One option is not using our model and data catalogs to load the data, and simply port over their model and data creation/processing code.
Ran throught the code to simply get the mothod to simply run. Need more work to confirm correctness of results. WIP commit with debug prints and small fixes.
Getting the reproduce for this method was a bit tricky. The main challenges seem to be just making sure that the dataset and models are processed and trained correctly. I have tried my best to build the model just like they have and also processing the dataset the same as them. Although the results are not identical, they can be classified atleast a level 1 on the reproduction scale and definitly can be improved in the near future. The Method does infact work well in finding robust recourse and some metrics seem to be inline with the results of the paper.
zkhotanlou
reviewed
Nov 20, 2025
I worked to resolve several issues, mainly stemming from the predict function I implemented in the rbr_loss script. The method results in the reproduce file now fairly closely align with those from the original authors code. The method does seem to heavily rely on the effectiveness of the trained model for its recourse finding. That means that if the model is very poor, then this method may not be able to actually get recourse. Overall, in terms of reproduction, I believe this can be marked between a 1-2.
zkhotanlou
reviewed
Nov 22, 2025
Collaborator
zkhotanlou
left a comment
There was a problem hiding this comment.
This is an implementation of the "RBR"[1] recourse method. The level of reproduction is on level1 as the unit tests checks the implementation could reproduce results reported in the paper for german dataset on neural network.
[1] Nguyen, Tuan-Duy Hien, Ngoc Bui, Duy Nguyen, Man-Chung Yue, and Viet Anh Nguyen. 2022. "Robust Bayesian Recourse." (UAI 2022)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The reproduction is functional and can be marked as a level one on the reproduction scale. I believe that the only metric that seems not to be in line exactly with the original paper/code is the "current validity", which in the reproduction seems to come consistently lower than it should.
I believe that the method is implemented correctly, so the most probable cause for the difference is the training for the base model.