-
Notifications
You must be signed in to change notification settings - Fork 79
Move AllReduce kernel args to Types.h (P1-D5) #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
minsii
wants to merge
8
commits into
meta-pytorch:main
Choose a base branch
from
minsii:export-D91983714
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: We have to disable CE test for now to unblock testing other critical sendrecv changes. However, the test should be fixed in T253314634 Differential Revision: D91980230
Summary: Implements native AVG support for the PAT (Parallel All-to-All Transpose) algorithm in ReduceScatter. Instead of falling back to Ring algorithm or using a separate divide kernel, this applies division at the final write step for each chunk. **Documentation Added:** - `meta/collectives/docs/ReduceScatterPat.md` - Comprehensive PAT algorithm documentation including 5-phase breakdown and 8-rank visualization - `meta/collectives/docs/ReduceScatterPatAvg.md` - PAT AVG design details, multi-chunk handling, and implementation notes **Key Implementation:** - Add `isFinalWrite` flag to `ncclPatStep` struct (set in Phase 4) to correctly apply division for all chunks in multi-chunk transfers (fixes large message bug) - Add FuncPatAvg<T> template that uses FuncSum for reduction and applies division as a postOp in final write step - Add ncclDevPatAvg enum for kernel dispatch - Update generate.py and def_build.bzl for PatAvg kernel generation - Enable via NCCL_ALGO=reducescatter:pat_postdiv **Meta overlay pattern used to minimize upstream changes:** - meta/device/FuncPatAvg.cuh: Full implementation (~120 lines) - meta/collectives/PatAvgAlgoHelper.h: Helper functions with lazy env detection - All src/ changes (~15 lines) are marked with `[META:PAT_AVG]` comments for rebasing tracking Differential Revision: D91948601
Summary: Add a Claude Code agent definition for reviewing ctran code changes. The agent: - Reviews diffs for correctness (thread safety, test coverage, code abstraction) - Performs performance review (benchmark requirements, roofline analysis) - References CLAUDE.md as the authoritative source for coding standards - Outputs structured feedback with clear recommendations (APPROVE/NEEDS_CHANGES/NEEDS_HUMAN_REVIEW) This agent should be invoked after code changes are made to provide automated review feedback before human review. Differential Revision: D91963243
Summary: Move CtranKernelSendArgs, CtranKernelRecvArgs, and CtranKernelSendRecvArgs from gpe/CtranGpeDev.h to algos/SendRecv/Types.h as ctran::sendrecv::KernelSendArgs, KernelRecvArgs, and KernelSendRecvArgs respectively. Part of KernelElem cleanup Phase 1. Naming follows the convention: - Remove "Ctran" prefix - Keep "Kernel" prefix - Keep "Send/Recv/SendRecv" suffix since they're distinct types Differential Revision: D91983715
Summary: Move CtranKernelAllGatherArgs from gpe/CtranGpeDev.h to algos/AllGather/Types.h as ctran::allgather::KernelArgs. Part of KernelElem cleanup Phase 1. Naming follows the convention: - Remove "Ctran" prefix - Keep "Kernel" prefix - Omit algorithm name since namespace provides context Differential Revision: D91983718
Summary: Move CtranKernelBroadcastArgs from gpe/CtranGpeDev.h to algos/Broadcast/Types.h as ctran::broadcast::KernelArgs. Part of KernelElem cleanup Phase 1. Naming follows the convention: - Remove "Ctran" prefix - Keep "Kernel" prefix - Omit algorithm name since namespace provides context Differential Revision: D91983717
Summary: Move CtranKernelReduceScatterArgs from gpe/CtranGpeDev.h to algos/ReduceScatter/Types.h as ctran::reducescatter::KernelArgs. Part of KernelElem cleanup Phase 1. Naming follows the convention: - Remove "Ctran" prefix - Keep "Kernel" prefix - Omit algorithm name since namespace provides context Differential Revision: D91983712
Summary: Move CtranKernelAllReduceArgs and CtranKernElemRole from gpe/CtranGpeDev.h to algos/AllReduce/Types.h as ctran::allreduce::KernelArgs and KernElemRole. Part of KernelElem cleanup Phase 1. Naming follows the convention: - Remove "Ctran" prefix - Keep "Kernel"/"KernElem" prefix - Omit algorithm name since namespace provides context Differential Revision: D91983714
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Move CtranKernelAllReduceArgs and CtranKernElemRole from gpe/CtranGpeDev.h to
algos/AllReduce/Types.h as ctran::allreduce::KernelArgs and KernElemRole.
Part of KernelElem cleanup Phase 1.
Naming follows the convention:
Differential Revision: D91983714