-
Notifications
You must be signed in to change notification settings - Fork 82
Add MPI extension with Allreduce! forward rule #2745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Valentin Churavy <[email protected]>
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/integration/MPI/runtests.jl b/test/integration/MPI/runtests.jl
index 4bce80bf..ee9b8c0f 100644
--- a/test/integration/MPI/runtests.jl
+++ b/test/integration/MPI/runtests.jl
@@ -3,5 +3,5 @@ using Enzyme
using Test
@testset "collectives" for np in (1, 2, 4)
- run(`$(mpiexec()) -n $np $(Base.julia_cmd()) --project=$(@__DIR__) $(joinpath(@__DIR__, "collectives.jl"))`)
+ run(`$(mpiexec()) -n $np $(Base.julia_cmd()) --project=$(@__DIR__) $(joinpath(@__DIR__, "collectives.jl"))`)
end |
| error("Forward mode MPI.Allreduce! is only implemented for MPI.SUM.") | ||
| end | ||
|
|
||
| if EnzymeRules.needs_primal(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The needs primal only is relevant to returning the result, if something is mutated it still needs to happen unless the argument is marked noneed
|
|
||
| import Enzyme.EnzymeCore: EnzymeRules | ||
|
|
||
| function EnzymeRules.forward(config, ::Const{typeof(MPI.Allreduce!)}, rt, v, op::Const, comm::Const) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for sake of understanding, is there a reason why this wasnt handled already llvm-side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Julia 1.10
%14 = call i32 @PMPI_Allreduce(i64 %6, i64 %bitcast_coercion, i32 noundef 1, i32 %9, i32 %11, i32 %13) #12 [ "jl_roots"({} addrspace(10)* %2, {} addrspace(10)* %1, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140676541099488 to {}*) to {} addrspace(10)*), {} addrspace(10)* %0) ], !dbg !25
0x3bb3f008
Unhandled MPI FUNCTION
UNREACHABLE executed at /workspace/srcdir/Enzyme/enzyme/Enzyme/CallDerivatives.cpp:2211!
[63249] signal (6.-6): Aborted
in expression starting at REPL[15]:1
unknown function (ip: 0x7ff25629894c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
_ZN4llvm25llvm_unreachable_internalEPKcS1_j at /home/vchuravy/.julia/juliaup/julia-1.10.10+0.x64.linux.gnu/bin/../lib/julia/libLLVM-15jl.so (unknown line)
handleMPI at /workspace/srcdir/Enzyme/enzyme/Enzyme/CallDerivatives.cpp:2211
handleKnownCallDerivatives at /workspace/srcdir/Enzyme/enzyme/Enzyme/CallDerivatives.cpp:2249
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6402
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:5062
EnzymeCreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:661
EnzymeCreateForwardDiff at /home/vchuravy/.julia/packages/Enzyme/rsnI8/src/api.jl:342
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay we should fix that needs EnzymeAD/Enzyme#2530
we may also need callingconv stuff like
https://github.com/EnzymeAD/Enzyme/blob/2e6f771cf4570d2800a67e9da9298c0d612d5020/enzyme/Enzyme/TypeAnalysis/TypeAnalysis.cpp#L5076 and perhaps elsewhere
|
@lcw and I also realized that this may not be needed for the application we look at. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2745 +/- ##
==========================================
- Coverage 70.00% 69.95% -0.06%
==========================================
Files 58 59 +1
Lines 19295 19309 +14
==========================================
Hits 13507 13507
- Misses 5788 5802 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| else | ||
| # would be nice to use MPI non-blocking collectives | ||
| foreach(v.dval) do dval | ||
| MPI.Allreduce!(dval, op, comm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the dvals really non-contiguous? If so, I'd think about a derived datatype for v.dval to get this into one Allreduce and get the benefits of vector-mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the dvals are all independent memory allocations.
The structure here is a NTuple{N, Vector{Float64}}, I have been wanting a variant where we use continous memory, but we can only do that from 1.11 and we likely can't gurantuee it.
I was thinking about derived types as well, but I have not yet managed to convince MPI to understand a Vector{Float64}
No description provided.