Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7007 +/- ##
==========================================
- Coverage 69.27% 69.16% -0.11%
==========================================
Files 384 384
Lines 67207 67276 +69
==========================================
- Hits 46557 46531 -26
- Misses 20650 20745 +95 ☔ View full report in Codecov by Sentry. |
| // For non-Matmul/Linear ops that still have output layout (e.g. | ||
| // D2MSubgraphOp), use configs as-is so validation and reference matching | ||
| // succeed. | ||
| if (!consumerConfigs.empty() && consumerConfigs.front().outputLayout) { |
There was a problem hiding this comment.
This will break current logic. We intentionally leave layouts NULL. consumerConfigs is provided to pick up OpSpecificAttrs.
There was a problem hiding this comment.
Reverted this change and updated the tests. For now we allow D2M_SubgraphOp to fall back to DRAM.
Created an issue for allowing it to be part of an L1 chain.
I described how we can possibly modify the call site and special-case D2M_SubgraphOp to be part of L1 chain, but that's not included in this PR as you suggested:
// We can change the call site and special-case D2MSubgraphOp. For example, in ShardSolver.cpp
llvm::SmallVector<OpConfig> testConfigs;
if (llvm::isa<ttnn::D2MSubgraphOp>(consumerOp)) {
for (const OpConfig &c : consumerConfigs) {
testConfigs.push_back(c);
}
} else {
testConfigs = optimizer_utils::getUniqueTestConfigs(
consumerConfigs, shouldUseIgnorePhysicalLayout(consumerOp));
}
xanderchin
left a comment
There was a problem hiding this comment.
High level looks good from my perspective. @rpavlovicTT and @odjuricicTT to sign off on optimizer code changes.
Ticket
#7029
Summary
This PR integrates the TTNN memory-layout optimizer with
ttnn.d2m_subgraphby implementing the constraint API for the op: D2M subgraph ops are now included in DFSharding-based layout analysis, and the optimizer correctly applies chosen layouts to the dispatch op, its output buffer(s), and the referenced D2M function. For now, D2M is allowed to spill to DRAM: producer→D2M edges use the default path (no L1 chain through D2M), so the optimizer places D2M input/output and its buffer in DRAM and keeps the D2M callee's types in sync with the dispatch after layout/spill decisions.Changes
1. D2MSubgraphOp OpModel interface (
TTNNOpModelInterface.cpp)getOpConstraintsThe implementation walks the D2M function body in program order and, for each internal op that implements the OpModel interface:
inputs(andopConfig.outputLayoutfor any extra args); SSA values from earlier ops use the layout stored when that op was processed.getOpConstraints(internalOpInputLayouts, opConfig)for the internal op.getOpRuntimeSimilarly walks the body, builds per-op input layouts via the same value-to-layout map, calls
getOpRuntimefor each internal op, and returns the sum of their runtimes.2. Optimizer support for D2MSubgraphOp (
Optimizer.cpp)applyChosenLayoutToD2MSubgraphOpWhen the optimizer picks a layout for a D2M dispatch op, this function:
syncD2MFuncTypesToDispatchInputsAfter layout/spill decisions (e.g. a producer spilled to DRAM), syncs the D2M function's argument and result types to the dispatch op's current input/output types so the callee signature matches the caller.
Integration
D2MSubgraphOpis handled by callingapplyChosenLayoutToD2MSubgraphOpinstead of the default path.syncD2MFuncTypesToDispatchInputson everyD2MSubgraphOpso D2M callee types stay in sync after any spill or layout change.3. DFShardingPolicy (
DFShardingPolicy.cpp)Tests
d2m_optimizer_linear_chain.mlirLinear chain matmul → matmul → d2m_subgraph (add, multiply) → matmul. With D2M spilling to DRAM, the optimizer places the chain in DRAM (matmul results, empty buffer, and D2M in/out in DRAM; function args remain L1). Checks that layouts and the final
to_layoutto DRAM are applied correctly and that the D2M callee's types match the dispatch (DRAM in/out).d2m_optimizer_fork_join.mlirFork-join pattern with a D2M subgraph on one branch. Checks that the optimizer applies layouts and spill (
to_layout) consistently, that the D2M branch uses a DRAM output buffer and result, and that the D2M callee's types (including DRAM when spilled) stay in sync with the dispatch op.Notes
applyChosenLayoutToD2MSubgraphOp).D2M_SubgraphOpwas detected is no longer necessary.D2M_SubgraphOpin L1 chain.Checklist