Add MPI extension with Allreduce! forward rule#2745
Add MPI extension with Allreduce! forward rule#2745lcw wants to merge 1 commit intoEnzymeAD:mainfrom
Conversation
Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/integration/MPI/runtests.jl b/test/integration/MPI/runtests.jl
index 4bce80bf..ee9b8c0f 100644
--- a/test/integration/MPI/runtests.jl
+++ b/test/integration/MPI/runtests.jl
@@ -3,5 +3,5 @@ using Enzyme
using Test
@testset "collectives" for np in (1, 2, 4)
- run(`$(mpiexec()) -n $np $(Base.julia_cmd()) --project=$(@__DIR__) $(joinpath(@__DIR__, "collectives.jl"))`)
+ run(`$(mpiexec()) -n $np $(Base.julia_cmd()) --project=$(@__DIR__) $(joinpath(@__DIR__, "collectives.jl"))`)
end |
| error("Forward mode MPI.Allreduce! is only implemented for MPI.SUM.") | ||
| end | ||
|
|
||
| if EnzymeRules.needs_primal(config) |
There was a problem hiding this comment.
The needs primal only is relevant to returning the result, if something is mutated it still needs to happen unless the argument is marked noneed
|
|
||
| import Enzyme.EnzymeCore: EnzymeRules | ||
|
|
||
| function EnzymeRules.forward(config, ::Const{typeof(MPI.Allreduce!)}, rt, v, op::Const, comm::Const) |
There was a problem hiding this comment.
for sake of understanding, is there a reason why this wasnt handled already llvm-side?
There was a problem hiding this comment.
On Julia 1.10
%14 = call i32 @PMPI_Allreduce(i64 %6, i64 %bitcast_coercion, i32 noundef 1, i32 %9, i32 %11, i32 %13) #12 [ "jl_roots"({} addrspace(10)* %2, {} addrspace(10)* %1, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140676541099488 to {}*) to {} addrspace(10)*), {} addrspace(10)* %0) ], !dbg !25
0x3bb3f008
Unhandled MPI FUNCTION
UNREACHABLE executed at /workspace/srcdir/Enzyme/enzyme/Enzyme/CallDerivatives.cpp:2211!
[63249] signal (6.-6): Aborted
in expression starting at REPL[15]:1
unknown function (ip: 0x7ff25629894c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
_ZN4llvm25llvm_unreachable_internalEPKcS1_j at /home/vchuravy/.julia/juliaup/julia-1.10.10+0.x64.linux.gnu/bin/../lib/julia/libLLVM-15jl.so (unknown line)
handleMPI at /workspace/srcdir/Enzyme/enzyme/Enzyme/CallDerivatives.cpp:2211
handleKnownCallDerivatives at /workspace/srcdir/Enzyme/enzyme/Enzyme/CallDerivatives.cpp:2249
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6402
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:5062
EnzymeCreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:661
EnzymeCreateForwardDiff at /home/vchuravy/.julia/packages/Enzyme/rsnI8/src/api.jl:342
There was a problem hiding this comment.
okay we should fix that needs EnzymeAD/Enzyme#2530
we may also need callingconv stuff like
https://github.com/EnzymeAD/Enzyme/blob/2e6f771cf4570d2800a67e9da9298c0d612d5020/enzyme/Enzyme/TypeAnalysis/TypeAnalysis.cpp#L5076 and perhaps elsewhere
|
@lcw and I also realized that this may not be needed for the application we look at. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2745 +/- ##
==========================================
- Coverage 70.00% 69.95% -0.06%
==========================================
Files 58 59 +1
Lines 19295 19309 +14
==========================================
Hits 13507 13507
- Misses 5788 5802 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| else | ||
| # would be nice to use MPI non-blocking collectives | ||
| foreach(v.dval) do dval | ||
| MPI.Allreduce!(dval, op, comm) |
There was a problem hiding this comment.
Are the dvals really non-contiguous? If so, I'd think about a derived datatype for v.dval to get this into one Allreduce and get the benefits of vector-mode.
There was a problem hiding this comment.
Yes the dvals are all independent memory allocations.
The structure here is a NTuple{N, Vector{Float64}}, I have been wanting a variant where we use continous memory, but we can only do that from 1.11 and we likely can't gurantuee it.
I was thinking about derived types as well, but I have not yet managed to convince MPI to understand a Vector{Float64}
No description provided.