Conversation
methods/catalog/genre/model.py
Outdated
|
|
||
| def get_counterfactuals(self, factuals: pd.DataFrame) -> pd.DataFrame: | ||
| """ | ||
| Generate counterfactuals |
There was a problem hiding this comment.
The purpose of this method is to calculate counterfactuals for the input factuals, not to act as a fake interface. The returned counterfactuals should be specific to the given factual input, not always a fixed output for the entire X_test to artificially reproduce the results.
methods/catalog/genre/reproduce.py
Outdated
|
|
||
| # Initialize GenRe recourse module | ||
| rec_module = GenReOriginal( | ||
| pair_model=pair_model, |
There was a problem hiding this comment.
The recourse method should be able to work with ModelCatalog, which itself works with DataCatalog, this implementation doesn't align with the repo structure.
methods/catalog/genre/reproduce.py
Outdated
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() No newline at end of file |
There was a problem hiding this comment.
I tried to run this script, but it doesn't run successfully and gets a ModuleNotFoundError.
methods/catalog/genre/reproduce.py
Outdated
| # 5. Initialize GenRe using our wrapper | ||
| print("Initializing GenRe...") | ||
| genre = GenRe( | ||
| mlmodel=ann_clf, # Pass ANN directly |
There was a problem hiding this comment.
The recourse method still does not align with the repository’s ModelCatalog and DataCatalog structure. Even if the current implementation successfully reproduces the original paper’s results, it cannot yet operate within the benchmark pipeline. After reproduction, the method must be runnable on all existing datasets and models in the benchmark suite. In its current form, the benchmark pipeline would fail.
There was a problem hiding this comment.
I have updated model and reproduce. GenRe class can handle both ModelCatalog and my model, and its implementation in reproduce.py will support the repo's structure if we use the commented block which calls the existing functions to load the repo's data and model. And since GenRe's Transformer was trained based on the author's dataset and the blackbox models trained on the author's dataset, therefore switching to the repository's data and models would definitely cause it to fail. In the previous attempt using the repo's models and dataset to reproduce GenRe, I was struggling to get comparable results, so here I used the original data and models for better reproduction.
There was a problem hiding this comment.
So, please add a test that performs a sanity check to ensure your implementation is compatible with the repository’s structure. There’s no need to check the reproducibility of results in this test; the goal is to confirm that the method will be able to run experiments later on other datasets and models in the repo.
methods/catalog/genre/reproduce.py
Outdated
| args.hf_repo, input_dim, device | ||
| ) | ||
|
|
||
| # ========== FUTURE: Load repo's data and models ========== |
There was a problem hiding this comment.
The commented implementation doesn’t work. We need to make sure that the implemented recourse method can do two things: first, reproduce the reported results, even when using different datasets and models, and second, work with and align to the repository’s structure.
These checks can be done either separately or together, as long as the recourse method itself remains fixed.
…e Genre method Add a fucntion in reproduce.py to test compatibility(working on it) TODO add data add model if current ones don't work
… mlp to binary output clean genre folder
|
I have added my data to datacatalog following the link u gave me it seems to work! |
zkhotanlou
left a comment
There was a problem hiding this comment.
After addressing the comments, please add the results of you implemented method on benchmark datasets and models to run_experiments.py and result.csv
There was a problem hiding this comment.
Please add the datasets specific to your implementation to that specific directory in methods/catalog/genre. We just need the datasets for benchmarking in ./data directory.
There was a problem hiding this comment.
But I need this (and all the other stuff that I added to data/catalog/) to access "my" data through DataCatalog, so that genre aligns with the repo's structure. I did this following the PR #7 you gave me.
There was a problem hiding this comment.
I mean just the location of the added dataset, we need to have the datasets specific to your method in the directory of that method.
There was a problem hiding this comment.
But the data already exists it in the genre directory. These changes are for accessing my data through Datacatalog
So i add it back
|
This is an implementation of the "Genre"[1] recourse method. The level of reproduction is on level1 as the unit tests check the implementation could reproduce results reported in the paper for the Adult dataset on MLP model, and also check to be compatible with the repo's structure. [1] Garg, P., Nagalapatti, L., & Sarawagi, S. (2025). From Search To Sampling: Generative Models For Robust Algorithmic Recourse. arXiv preprint arXiv:2505.07351. |
I redid the genre implementation. This time I tried to use all the stuff from the author's repo for accurate reproduction, with just a few modifications to fit our repo. The GenRe models I trained on my Mac and in my environment (Python 3.12, PyTorch "a kind-of-recent version") achieved better reproduction results but are, unfortunately, incompatible with our Python 3.7 environment, and therefore useless. T_T
Now the models are hosted on huggingface (https://huggingface.co/jamie250/genrereproduce) and the reproduce.py can fetch them. We need to add the huggingface-hub library to our environment to support fetching models from huggingface so I modified the requirements-dev.txt as well.
Paper:
FROM SEARCH TO SAMPLING: GENERATIVE MODELS FOR ROBUST ALGORITHMIC RECOURSE
By Prateek Garg*, Lokesh Nagalapatti, Sunita Sarawagi
ICLR 2025
Results (Table 2, Adult dataset, shown as "mine/original"):
Cost: 0.70/0.69
Val: 0.93/1.00
LOF: 0.90/0.98
Score: 1.78/1.93
Potential Factors Affecting Metrics:
1 Environment, especially PyTorch version differences
2 Batch size for transformer training 8192 -> 1024