-
Notifications
You must be signed in to change notification settings - Fork 82
Add DI integration tests #2531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DI integration tests #2531
Conversation
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
.github/workflows/Integration.yml
Outdated
|
|
||
| jobs: | ||
| integration: | ||
| timeout-minutes: 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gdalle I think this is too long to be included. Different machines will have different limits, but is this doable within 20 minutes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, I was setting this to a high value for prototyping, hoping to see how long it takes to run on your custom runner.
In the DI test suite with the default runners, the Enzyme part takes close to an hour, but maybe here the machine is more bulky? Or we can leverage multiprocessing to speed things up (how many cores are there)?
Once I have an order of magnitude of how expensive this is, it will tell me what I need to cut to bring it down under 20 min.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many cores are there
32, but they help only with precompilation, if tests are fully serial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I parallelize tests, will the JIT compilation be parallelized too? In my understanding, this is only true if I distribute the tasks to separate worker processes (using Distributed), and not just to separate threads (using Base.Threads)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have started working on reducing compilation time in DI tests with SnoopCompile, there are probably gains to be made there too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably a Distributed-based test setup like CUDA.jl would make great use of the parallelism we have available here, it's very unfortunate there isn't anything standardised for parallel testing at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the meantime I don't think 20 minutes are enough for all jobs at the moment, maybe something like 45 (or 60, just be safe) would be better than 120.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's very unfortunate there isn't anything standardised for parallel testing at the moment.
I have been using ReTestItems.jl in Lux and a few SciML repos for over a year now. it generally works great. (https://github.com/LuxDL/Lux.jl/actions/runs/17257270161/job/48971604321#step:9:644). Though it requires pretty much rewriting your entire existing test suite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set the timeout to 45 minutes
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2531 +/- ##
==========================================
- Coverage 74.79% 74.75% -0.04%
==========================================
Files 56 56
Lines 17628 17637 +9
==========================================
Hits 13185 13185
- Misses 4443 4452 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
I think the testing setup is pretty good as a starting point (which means my job is done, yay), but there are testing failures (should be marked as broken for the time being?) |
|
I'll take a look at the test failures |
|
bump @gdalle |
This seems to be a general issue with our |
|
My bad, I had added some tests which don't pass in my current DI suite, so I shouldn't expect them to pass here. The current commit should be shorter and succeed |
|
We should perhaps still add a rule for make_zero xD |
|
It seems the DI tests are now passing, in a total of ~23 min |
|
the currently failing CI (bijectors) previously only ran on 1.10, and we need to have CI green before merge. @mhauru @penelopeysm can you help @gdalle extract failing tests and mark then as failing in 1.11 so we can merge this? |
|
also above we clearly should add the forward rule for jl_eqtable_get too (iirc we have the reverse one) |
@gdalle in Enzyme.jl/test/integration/Bijectors/runtests.jl Lines 80 to 84 in 10b2caa
broken = VERSION>=v"1.11" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to use the same compiler flags whenever you install packages, otherwise when you run the tests all the packages have to be recompiled anyway which makes the first precompilation completely useless and wasteful (as a concrete example, see https://github.com/EnzymeAD/Enzyme.jl/actions/runs/17670045529/job/50220643015?pr=2531#step:9:11)
|
I assume the EnzymeTestUtils failure is unrelated |
| sum_b_binv_test_case(Bijectors.InvertibleBatchNorm(3), (3, 3)), | ||
| sum_b_binv_test_case( | ||
| Bijectors.InvertibleBatchNorm(3), | ||
| (3, 3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@penelopeysm so here [and below] are the issue that were disabled for 1.11, can you help open mwe issues for these so we don't lose track?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and/or @gdalle I think we need an issue open [with url for open issue] before merge so we don't forget about.
other than that looks good to merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes me lots of time to minimise Enzyme examples. I don't at all mind doing it when I get the time to, but for now I will just open unminimised issues so that you all can merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
yeah its unrelated |
Co-authored-by: Mosè Giordano <[email protected]>
Co-authored-by: Penelope Yong <[email protected]>
Co-authored-by: Penelope Yong <[email protected]>
Co-authored-by: Mosè Giordano <[email protected]>
Proposed answer to #2528 for DifferentiationInterface
CI.yml(Bijectors, DynamicExpressions) and put them inside a newIntegration.yml.-O1option to thejuliacommand to reduce compilation time (the main culprit for the duration of Enzyme's DI tests).