Commit e9b4638
[multi-gpu] Phase 2: kernel-driven producer/consumer rewrite
Per @mawad-amd's review feedback on PR #1577: replace the host-orchestrated
mgpuMemcpy reference test with a kernel-driven producer/consumer pair.
Cross-rank data movement is now performed by GPU compute units issuing
loads/stores directly into peer HBM over XGMI, not by the HIP copy engine.
Changes:
- air_sym_handwritten.mlir is rewritten as one gpu.module with two
gpu.func kernels:
* producer (rank 0): each thread writes 42.0 into rank 1's `data`
via memref.store on a peer memref produced by air.translate.
Lane 0 of each warp signals the per-warp flag with a release
atomicrmw on rank 1's `flags`.
* consumer (rank 1): lane 0 of each warp spins on its flag with an
acquire atomic load until producer signals; gpu.barrier then
releases all 64 lanes to read their data slot and copy it into
a verify buffer. Host D2H reads verify_buf and checks 42.0.
The host driver (func.func @main) initializes the symmetric heap,
copies heap_bases into a device-resident buffer (workaround for the
fact that mgpuGetHeapBases returns a host pointer), and dispatches
the producer or consumer kernel based on rank.
- run.sh adds the GPU compilation chain (rocdl-attach-target,
convert-gpu-to-rocdl, gpu-module-to-binary, gpu-async-region,
gpu-to-llvm) before mlir-runner.
- run.sh sets HIP_VISIBLE_DEVICES=$i + LOCAL_RANK=0 per process so each
rank sees only its own GPU as device 0. This eliminates the
device-binding ambiguity between airgpu's hipSetDevice and MLIR's
built-in gpu.launch_func handling that would otherwise cause rank N>0
to fail with hipErrorInvalidDevice when launching kernels.
Validated on rad-mi325x-1 (8x MI325X, ROCm 7.1.1):
W=2: rank 1 (consumer): cross-rank kernel write PASS (verify[0]=42.0)
W=4: ALL 4 RANKS PASSED (rank 0/1 active, ranks 2-3 idle)
W=8: ALL 8 RANKS PASSED (rank 0/1 active, ranks 2-7 idle)
This is the first time GPU compute units (not the HIP copy engine)
have been observed driving cross-rank data movement over XGMI in this
stack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 82c7961 commit e9b4638
2 files changed
Lines changed: 303 additions & 132 deletions
0 commit comments