Skip to content

Conversation

maartenvanhooftds
Copy link

@maartenvanhooftds maartenvanhooftds commented Jun 2, 2025

Inspired by Issue 1313.

Changes:

  • Added some unit tests
  • Created a test for ERM and CACM based on this notebook, but with some changes here and there, e.g. such that we don't have to download data during tests.

First contribution here, please review critically for any mistakes or inconsistencies!

@maartenvanhooftds maartenvanhooftds force-pushed the causal-prediction-tests branch 2 times, most recently from 5df95b8 to 58a2504 Compare June 2, 2025 05:56
@maartenvanhooftds maartenvanhooftds changed the title Tests for causal prediction algorithms Tests for causal prediction Jun 2, 2025
@maartenvanhooftds maartenvanhooftds marked this pull request as ready for review June 2, 2025 06:00
@maartenvanhooftds maartenvanhooftds force-pushed the causal-prediction-tests branch from 58a2504 to 424755d Compare June 2, 2025 12:43
Signed-off-by: maartenvanhooft <[email protected]>
@maartenvanhooftds maartenvanhooftds force-pushed the causal-prediction-tests branch from 424755d to 17a97cb Compare June 2, 2025 12:44
Copy link
Member

@amit-sharma amit-sharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this PR, @maartenvanhooftds . The tests make sense, but I'm wondering if we can have a stronger test that compares CACM and ERM.
How about the following property: difference in accuracy between a test dataset from the same distribution as train, and the main test dataset? That would be higher for ERM and we can check as a comparison assert. Can you add this for your setup?

@maartenvanhooftds
Copy link
Author

maartenvanhooftds commented Jun 10, 2025

Great feedback, thanks! Will implement it later this week.

Edit: Sorry, it has been taking a bit longer, just started a new job. It's still on my mind though.

amit-sharma
amit-sharma previously approved these changes Jun 28, 2025
Copy link
Member

@amit-sharma amit-sharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the changes, @maartenvanhooftds . The PR looks good now.

@amit-sharma
Copy link
Member

@all-contributors please add @maartenvanhooftds for code

Copy link
Contributor

@amit-sharma

I've put up a pull request to add @maartenvanhooftds! 🎉

Signed-off-by: Amit Sharma <[email protected]>
@amit-sharma
Copy link
Member

@maartenvanhooftds Just realized that sometimes the test does not succeed. If you look at the CI build above, the result is around 0.61 which is lower than 0.7.
Can you double check the values that the code is producing? One fix may be to lower the threshold, but I think it may be useful to study the output and create a dataset where the result is always high.

@maartenvanhooftds
Copy link
Author

maartenvanhooftds commented Jul 29, 2025

Thanks for your patience @amit-sharma

If you ask me, two things are not going right:

  1. Results have too low accuracy
  2. The seed is not respected in the CI build.

To get some insights in 1), I've re-ran without seeds for 100 runs to get an insight in the accuracy distribution of CACM for val and test split. I've found that even when playing with higher signal (beta) on the dataloaders, I don't always get sufficient accuraccy, occassionaly there just are some outliers. So all in all the results are good, but some poor outliers.

I think in general, the nicest way to solve this is by seeding. Upper code passes on my machine 😄 So I must be missing something for setting the seed. Do you have an idea on either:
a) what seed I should've set that would be required for reproducibility in CI run?
OR b) how I can set up the same environment as CI locally, such that I can reproduce the CI results (which I can't now)?

@amit-sharma
Copy link
Member

thanks for looking into this, @maartenvanhooftds .
Looks like pytorch lightning has its own random seeds to be set too. Stack overflow

can you try adding seed_everything and deterministic=True?

Otherwise it may be a version mismatch in py3.11 (the test that is failing). There's not too much difference between the github CI env and a local installation, except the exact versions of the packages installed. you may want to check with py3.11 and the packages installed here in the log here

Copy link

github-actions bot commented Oct 2, 2025

This PR is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale label Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants