Skip to content

Conversation

@gdalle
Copy link
Contributor

@gdalle gdalle commented Aug 27, 2025

Proposed answer to #2528 for DifferentiationInterface

  • Remove existing integration tests from CI.yml (Bijectors, DynamicExpressions) and put them inside a new Integration.yml.
  • Add tests for DifferentiationInterface, along with a documentation on how to interpret them.
  • Make sure that all tests pass:
  • Increase time limit (to see how long the first run takes), pass -O1 option to the julia command to reduce compilation time (the main culprit for the duration of Enzyme's DI tests).

@github-actions
Copy link
Contributor

github-actions bot commented Aug 27, 2025

Your PR no longer requires formatting changes. Thank you for your contribution!


jobs:
integration:
timeout-minutes: 120
Copy link
Member

@wsmoses wsmoses Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gdalle I think this is too long to be included. Different machines will have different limits, but is this doable within 20 minutes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, I was setting this to a high value for prototyping, hoping to see how long it takes to run on your custom runner.
In the DI test suite with the default runners, the Enzyme part takes close to an hour, but maybe here the machine is more bulky? Or we can leverage multiprocessing to speed things up (how many cores are there)?
Once I have an order of magnitude of how expensive this is, it will tell me what I need to cut to bring it down under 20 min.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many cores are there

32, but they help only with precompilation, if tests are fully serial.

Copy link
Contributor Author

@gdalle gdalle Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I parallelize tests, will the JIT compilation be parallelized too? In my understanding, this is only true if I distribute the tasks to separate worker processes (using Distributed), and not just to separate threads (using Base.Threads)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have started working on reducing compilation time in DI tests with SnoopCompile, there are probably gains to be made there too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably a Distributed-based test setup like CUDA.jl would make great use of the parallelism we have available here, it's very unfortunate there isn't anything standardised for parallel testing at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime I don't think 20 minutes are enough for all jobs at the moment, maybe something like 45 (or 60, just be safe) would be better than 120.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very unfortunate there isn't anything standardised for parallel testing at the moment.

I have been using ReTestItems.jl in Lux and a few SciML repos for over a year now. it generally works great. (https://github.com/LuxDL/Lux.jl/actions/runs/17257270161/job/48971604321#step:9:644). Though it requires pretty much rewriting your entire existing test suite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the timeout to 45 minutes

@codecov
Copy link

codecov bot commented Aug 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.75%. Comparing base (146814c) to head (e23c4d3).
⚠️ Report is 125 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2531      +/-   ##
==========================================
- Coverage   74.79%   74.75%   -0.04%     
==========================================
  Files          56       56              
  Lines       17628    17637       +9     
==========================================
  Hits        13185    13185              
- Misses       4443     4452       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gdalle

This comment was marked as resolved.

@giordano

This comment was marked as resolved.

@wsmoses

This comment was marked as resolved.

@giordano

This comment was marked as resolved.

@giordano
Copy link
Member

I think the testing setup is pretty good as a starting point (which means my job is done, yay), but there are testing failures (should be marked as broken for the time being?)

@gdalle
Copy link
Contributor Author

gdalle commented Aug 28, 2025

I'll take a look at the test failures

@wsmoses
Copy link
Member

wsmoses commented Sep 11, 2025

bump @gdalle

@vchuravy
Copy link
Member

Got exception outside of a @test
  Enzyme execution failed.
  Enzyme: Not yet implemented forward for jl_eqtable_get
  
  Stacktrace:
    [1] get
      @ ./iddict.jl:102 [inlined]
    [2] in
      @ ./iddict.jl:192 [inlined]
    [3] haskey
      @ ./abstractdict.jl:17 [inlined]
    [4] make_zero
      @ /__w/Enzyme.jl/Enzyme.jl/src/typeutils/make_zero.jl:39 [inlined]
    [5] #131
      @ /__w/Enzyme.jl/Enzyme.jl/src/typeutils/make_zero.jl:169 [inlined]
    [6] ntuple
      @ ./ntuple.jl:19
    [7] make_zero
      @ /__w/Enzyme.jl/Enzyme.jl/src/typeutils/make_zero.jl:167 [inlined]
    [8] make_zero
      @ /__w/Enzyme.jl/Enzyme.jl/src/typeutils/make_zero.jl:181 [inlined]
    [9] make_zero (repeats 2 times)
      @ /__w/Enzyme.jl/Enzyme.jl/lib/EnzymeCore/src/EnzymeCore.jl:587 [inlined]
   [10] _shadow
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/ext/DifferentiationInterfaceEnzymeExt/utils.jl:82 [inlined]
   [11] #7
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/ext/DifferentiationInterfaceEnzymeExt/utils.jl:103 [inlined]
   [12] map
      @ ./tuple.jl:291 [inlined]
   [13] make_context_shadows
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/ext/DifferentiationInterfaceEnzymeExt/utils.jl:102 [inlined]
   [14] prepare_pushforward_nokwarg
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/ext/DifferentiationInterfaceEnzymeExt/forward_onearg.jl:20 [inlined]
   [15] _prepare_pullback_aux
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/src/first_order/pullback.jl:293
   [16] prepare_pullback_nokwarg
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/src/first_order/pullback.jl:259 [inlined]
   [17] prepare_gradient_nokwarg
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/src/first_order/gradient.jl:93 [inlined]
   [18] gradient!
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/src/first_order/gradient.jl:76
   [19] shuffled_gradient!
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/src/first_order/gradient.jl:182 [inlined]
   [20] shuffled_gradient!
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/src/first_order/gradient.jl:0 [inlined]
   [21] fwddiffejulia_shuffled_gradient__106734_inner_1wrap
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/src/first_order/gradient.jl:0
   [22] macro expansion
      @ /__w/Enzyme.jl/Enzyme.jl/src/compiler.jl:5691 [inlined]
   [23] enzyme_call
      @ /__w/Enzyme.jl/Enzyme.jl/src/compiler.jl:5225 [inlined]
   [24] ForwardModeThunk
      @ /__w/Enzyme.jl/Enzyme.jl/src/compiler.jl:5116 [inlined]
   [25] autodiff
      @ /__w/Enzyme.jl/Enzyme.jl/src/Enzyme.jl:669 [inlined]
   [26] autodiff
      @ /__w/Enzyme.jl/Enzyme.jl/src/Enzyme.jl:538 [inlined]
   [27] value_and_pushforward!
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/ext/DifferentiationInterfaceEnzymeExt/forward_twoarg.jl:99 [inlined]
   [28] pushforward!
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/ext/DifferentiationInterfaceEnzymeExt/forward_twoarg.jl:114 [inlined]
   [29] _hvp_aux!
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/src/second_order/hvp.jl:370 [inlined]
   [30] hvp!
      @ ~/.julia/packages/DifferentiationInterface/TuoGH/src/second_order/hvp.jl:346 [inlined]
   [31] hvp!(f::DifferentiationInterfaceTest.MultiplyByConstantAndStoreInCache{:out, typeof(DifferentiationInterfaceTest.arr_to_num_linalg)}, tg::Tuple{Matrix{Float64}}, backend::AutoEnzyme{EnzymeCore.ForwardMode{false, EnzymeCore.FFIABI, false, false, false}, Nothing}, x::Matrix{Float64}, tx::Tuple{Matrix{Float64}}, contexts::DifferentiationInterface.ConstantOrCache{@NamedTuple{cache::Vector{Float64}, constant::@NamedTuple{a::Float64, b::Vector{Float64}}}})
      @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/TuoGH/src/second_order/hvp.jl:89
   [32] test_correctness(ba::AutoEnzyme{EnzymeCore.ForwardMode{false, EnzymeCore.FFIABI, false, false, false}, Nothing}, scen::DifferentiationInterfaceTest.Scenario{:hvp, :in, :out, DifferentiationInterfaceTest.MultiplyByConstantAndStoreInCache{:out, typeof(DifferentiationInterfaceTest.arr_to_num_linalg)}, Matrix{Float64}, Float64, Tuple{Matrix{Float64}}, Tuple{DifferentiationInterface.ConstantOrCache{@NamedTuple{cache::Vector{Float64}, constant::@NamedTuple{a::Float64, b::Vector{Float64}}}}}, Matrix{Float64}, Tuple{Matrix{Float64}}, @NamedTuple{x::Matrix{Float64}, t::Tuple{Matrix{Float64}}, contexts::Tuple{DifferentiationInterface.ConstantOrCache{@NamedTuple{cache::Vector{Float64}, constant::@NamedTuple{a::Float64, b::Vector{Float64}}}}}}}; isapprox::typeof(isapprox), atol::Int64, rtol::Float64, scenario_intact::Bool, sparsity::Bool, reprepare::Bool)

This seems to be a general issue with our make_zero implementation? E.g. second order will fail since we have a IDDict in there?

@gdalle
Copy link
Contributor Author

gdalle commented Sep 11, 2025

My bad, I had added some tests which don't pass in my current DI suite, so I shouldn't expect them to pass here. The current commit should be shorter and succeed

@vchuravy
Copy link
Member

We should perhaps still add a rule for make_zero xD

@gdalle
Copy link
Contributor Author

gdalle commented Sep 11, 2025

It seems the DI tests are now passing, in a total of ~23 min

@wsmoses
Copy link
Member

wsmoses commented Sep 11, 2025

the currently failing CI (bijectors) previously only ran on 1.10, and we need to have CI green before merge.

@mhauru @penelopeysm can you help @gdalle extract failing tests and mark then as failing in 1.11 so we can merge this?

@wsmoses
Copy link
Member

wsmoses commented Sep 11, 2025

also above we clearly should add the forward rule for jl_eqtable_get too (iirc we have the reverse one)

@giordano
Copy link
Member

the currently failing CI (bijectors) previously only ran on 1.10, and we need to have CI green before merge.

@gdalle in

@test(
Enzyme.gradient(f_mode, Enzyme.Const(f), x...)[1] finitediff,
rtol = rtol,
atol = atol,
)
you can add the other "keword argument"

broken = VERSION>=v"1.11"

@giordano giordano mentioned this pull request Sep 12, 2025
13 tasks
Copy link
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use the same compiler flags whenever you install packages, otherwise when you run the tests all the packages have to be recompiled anyway which makes the first precompilation completely useless and wasteful (as a concrete example, see https://github.com/EnzymeAD/Enzyme.jl/actions/runs/17670045529/job/50220643015?pr=2531#step:9:11)

@gdalle
Copy link
Contributor Author

gdalle commented Sep 12, 2025

I assume the EnzymeTestUtils failure is unrelated

sum_b_binv_test_case(Bijectors.InvertibleBatchNorm(3), (3, 3)),
sum_b_binv_test_case(
Bijectors.InvertibleBatchNorm(3),
(3, 3),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@penelopeysm so here [and below] are the issue that were disabled for 1.11, can you help open mwe issues for these so we don't lose track?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and/or @gdalle I think we need an issue open [with url for open issue] before merge so we don't forget about.
other than that looks good to merge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes me lots of time to minimise Enzyme examples. I don't at all mind doing it when I get the time to, but for now I will just open unminimised issues so that you all can merge this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

@wsmoses
Copy link
Member

wsmoses commented Sep 12, 2025

I assume the EnzymeTestUtils failure is unrelated

yeah its unrelated

@wsmoses wsmoses merged commit 4715e54 into EnzymeAD:main Sep 13, 2025
29 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants