Skip to content

Conversation

@minsii
Copy link
Contributor

@minsii minsii commented Feb 1, 2026

Summary:
Move CtranKernelBroadcastArgs from gpe/CtranGpeDev.h to algos/Broadcast/Types.h
as ctran::broadcast::KernelArgs. Part of KernelElem cleanup Phase 1.

Naming follows the convention:

  • Remove "Ctran" prefix
  • Keep "Kernel" prefix
  • Omit algorithm name since namespace provides context

Differential Revision: D91983717

Summary: We have to disable CE test for now to unblock testing other critical sendrecv changes. However, the test should be fixed in T253314634

Differential Revision: D91980230
Summary:
Implements native AVG support for the PAT (Parallel All-to-All Transpose)
algorithm in ReduceScatter. Instead of falling back to Ring algorithm or
using a separate divide kernel, this applies division at the final write step
for each chunk.

**Documentation Added:**
- `meta/collectives/docs/ReduceScatterPat.md` - Comprehensive PAT algorithm
  documentation including 5-phase breakdown and 8-rank visualization
- `meta/collectives/docs/ReduceScatterPatAvg.md` - PAT AVG design details,
  multi-chunk handling, and implementation notes

**Key Implementation:**
- Add `isFinalWrite` flag to `ncclPatStep` struct (set in Phase 4) to correctly
  apply division for all chunks in multi-chunk transfers (fixes large message bug)
- Add FuncPatAvg<T> template that uses FuncSum for reduction and applies
  division as a postOp in final write step
- Add ncclDevPatAvg enum for kernel dispatch
- Update generate.py and def_build.bzl for PatAvg kernel generation
- Enable via NCCL_ALGO=reducescatter:pat_postdiv

**Meta overlay pattern used to minimize upstream changes:**
- meta/device/FuncPatAvg.cuh: Full implementation (~120 lines)
- meta/collectives/PatAvgAlgoHelper.h: Helper functions with lazy env detection
- All src/ changes (~15 lines) are marked with `[META:PAT_AVG]` comments for
  rebasing tracking

Differential Revision: D91948601
Summary:
Add a Claude Code agent definition for reviewing ctran code changes. The agent:
- Reviews diffs for correctness (thread safety, test coverage, code abstraction)
- Performs performance review (benchmark requirements, roofline analysis)
- References CLAUDE.md as the authoritative source for coding standards
- Outputs structured feedback with clear recommendations (APPROVE/NEEDS_CHANGES/NEEDS_HUMAN_REVIEW)

This agent should be invoked after code changes are made to provide automated review feedback before human review.

Differential Revision: D91963243
Summary:
Move CtranKernelSendArgs, CtranKernelRecvArgs, and CtranKernelSendRecvArgs from
gpe/CtranGpeDev.h to algos/SendRecv/Types.h as ctran::sendrecv::KernelSendArgs,
KernelRecvArgs, and KernelSendRecvArgs respectively.

Part of KernelElem cleanup Phase 1.

Naming follows the convention:
- Remove "Ctran" prefix
- Keep "Kernel" prefix
- Keep "Send/Recv/SendRecv" suffix since they're distinct types

Differential Revision: D91983715
Summary:
Move CtranKernelAllGatherArgs from gpe/CtranGpeDev.h to algos/AllGather/Types.h
as ctran::allgather::KernelArgs. Part of KernelElem cleanup Phase 1.

Naming follows the convention:
- Remove "Ctran" prefix
- Keep "Kernel" prefix
- Omit algorithm name since namespace provides context

Differential Revision: D91983718
Summary:
Move CtranKernelBroadcastArgs from gpe/CtranGpeDev.h to algos/Broadcast/Types.h
as ctran::broadcast::KernelArgs. Part of KernelElem cleanup Phase 1.

Naming follows the convention:
- Remove "Ctran" prefix
- Keep "Kernel" prefix
- Omit algorithm name since namespace provides context

Differential Revision: D91983717
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 1, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 1, 2026

@minsii has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91983717.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant