-
Notifications
You must be signed in to change notification settings - Fork 992
Tests for causal prediction #1321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Tests for causal prediction #1321
Conversation
5df95b8
to
58a2504
Compare
58a2504
to
424755d
Compare
Signed-off-by: maartenvanhooft <[email protected]>
424755d
to
17a97cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding this PR, @maartenvanhooftds . The tests make sense, but I'm wondering if we can have a stronger test that compares CACM and ERM.
How about the following property: difference in accuracy between a test dataset from the same distribution as train, and the main test dataset? That would be higher for ERM and we can check as a comparison assert. Can you add this for your setup?
Great feedback, thanks! Will implement it later this week. Edit: Sorry, it has been taking a bit longer, just started a new job. It's still on my mind though. |
Signed-off-by: maartenvanhooft <[email protected]>
2fec8f0
to
f228a4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the changes, @maartenvanhooftds . The PR looks good now.
@all-contributors please add @maartenvanhooftds for code |
I've put up a pull request to add @maartenvanhooftds! 🎉 |
Signed-off-by: Amit Sharma <[email protected]>
@maartenvanhooftds Just realized that sometimes the test does not succeed. If you look at the CI build above, the result is around |
Thanks for your patience @amit-sharma If you ask me, two things are not going right:
To get some insights in 1), I've re-ran without seeds for 100 runs to get an insight in the accuracy distribution of CACM for val and test split. I've found that even when playing with higher signal (beta) on the dataloaders, I don't always get sufficient accuraccy, occassionaly there just are some outliers. So all in all the results are good, but some poor outliers. I think in general, the nicest way to solve this is by seeding. Upper code passes on my machine 😄 So I must be missing something for setting the seed. Do you have an idea on either: |
thanks for looking into this, @maartenvanhooftds . can you try adding Otherwise it may be a version mismatch in py3.11 (the test that is failing). There's not too much difference between the github CI env and a local installation, except the exact versions of the packages installed. you may want to check with py3.11 and the packages installed here in the log here |
This PR is stale because it has been open for 60 days with no activity. |
Inspired by Issue 1313.
Changes:
First contribution here, please review critically for any mistakes or inconsistencies!