Skip to content

Integration of optimizer and D2MSubgraphOp#7007

Open
sgholamiTT wants to merge 4 commits intomainfrom
sgholami/optimizer-d2m-integration
Open

Integration of optimizer and D2MSubgraphOp#7007
sgholamiTT wants to merge 4 commits intomainfrom
sgholami/optimizer-d2m-integration

Conversation

@sgholamiTT
Copy link
Contributor

@sgholamiTT sgholamiTT commented Feb 11, 2026

Ticket

#7029

Summary

This PR integrates the TTNN memory-layout optimizer with ttnn.d2m_subgraph by implementing the constraint API for the op: D2M subgraph ops are now included in DFSharding-based layout analysis, and the optimizer correctly applies chosen layouts to the dispatch op, its output buffer(s), and the referenced D2M function. For now, D2M is allowed to spill to DRAM: producer→D2M edges use the default path (no L1 chain through D2M), so the optimizer places D2M input/output and its buffer in DRAM and keeps the D2M callee's types in sync with the dispatch after layout/spill decisions.

Changes

1. D2MSubgraphOp OpModel interface (TTNNOpModelInterface.cpp)

  • getOpConstraints
    The implementation walks the D2M function body in program order and, for each internal op that implements the OpModel interface:

    • Builds input layouts from a value-to-layout map: block args use the D2M's inputs (and opConfig.outputLayout for any extra args); SSA values from earlier ops use the layout stored when that op was processed.
    • Calls getOpConstraints(internalOpInputLayouts, opConfig) for the internal op.
    • Takes the element-wise max of L1 constraint fields (cbL1PeakSize, tensorL1PeakSize, peakL1MemorySize, outputL1BufferSize) and keeps the D2M's output layout.
  • getOpRuntime
    Similarly walks the body, builds per-op input layouts via the same value-to-layout map, calls getOpRuntime for each internal op, and returns the sum of their runtimes.

2. Optimizer support for D2MSubgraphOp (Optimizer.cpp)

  • applyChosenLayoutToD2MSubgraphOp
    When the optimizer picks a layout for a D2M dispatch op, this function:

    • Sets the dispatch op's result type(s) to the new tensor type.
    • Updates each output buffer (Empty op): type, dtype, layout (tile/row-major), and memory config (including shard spec when applicable).
    • Updates the D2M function: entry block argument types from the dispatch's input types, all body op result types to the new type, and the function's type signature.
  • syncD2MFuncTypesToDispatchInputs
    After layout/spill decisions (e.g. a producer spilled to DRAM), syncs the D2M function's argument and result types to the dispatch op's current input/output types so the callee signature matches the caller.

  • Integration

    • When applying the chosen layout to an op, D2MSubgraphOp is handled by calling applyChosenLayoutToD2MSubgraphOp instead of the default path.
    • A post-processing walk over the function calls syncD2MFuncTypesToDispatchInputs on every D2MSubgraphOp so D2M callee types stay in sync after any spill or layout change.

3. DFShardingPolicy (DFShardingPolicy.cpp)

  • D2MSubgraphOp is included in the set of ops that can participate in L1 sharding chains (alongside matmul, linear, elementwise, etc.) so that the policy considers D2M when building and resolving chains.

Tests

  • d2m_optimizer_linear_chain.mlir
    Linear chain matmul → matmul → d2m_subgraph (add, multiply) → matmul. With D2M spilling to DRAM, the optimizer places the chain in DRAM (matmul results, empty buffer, and D2M in/out in DRAM; function args remain L1). Checks that layouts and the final to_layout to DRAM are applied correctly and that the D2M callee's types match the dispatch (DRAM in/out).

  • d2m_optimizer_fork_join.mlir
    Fork-join pattern with a D2M subgraph on one branch. Checks that the optimizer applies layouts and spill (to_layout) consistently, that the D2M branch uses a DRAM output buffer and result, and that the D2M callee's types (including DRAM when spilled) stay in sync with the dispatch op.

Notes

  • D2MSubgraphOp with multiple results is not yet supported (assert in applyChosenLayoutToD2MSubgraphOp).
  • Runtime for D2M is the sum of internal op runtimes; constraints use the max of internal op L1 usage so the scheduler sees a conservative peak for the subgraph.
  • The previous PR that spilled to DRAM when a D2M_SubgraphOp was detected is no longer necessary.
  • This issue tracks allowing D2M_SubgraphOp in L1 chain.

Checklist

  • New/Existing tests provide coverage for changes

@codecov-commenter
Copy link

codecov-commenter commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 0% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.16%. Comparing base (e24653d) to head (aa63ff6).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...b/Dialect/TTNN/Interfaces/TTNNOpModelInterface.cpp 0.00% 69 Missing ⚠️
lib/Dialect/TTNN/Analysis/DFShardingPolicy.cpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7007      +/-   ##
==========================================
- Coverage   69.27%   69.16%   -0.11%     
==========================================
  Files         384      384              
  Lines       67207    67276      +69     
==========================================
- Hits        46557    46531      -26     
- Misses      20650    20745      +95     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sgholamiTT sgholamiTT self-assigned this Feb 11, 2026
// For non-Matmul/Linear ops that still have output layout (e.g.
// D2MSubgraphOp), use configs as-is so validation and reference matching
// succeed.
if (!consumerConfigs.empty() && consumerConfigs.front().outputLayout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break current logic. We intentionally leave layouts NULL. consumerConfigs is provided to pick up OpSpecificAttrs.

Copy link
Contributor Author

@sgholamiTT sgholamiTT Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this change and updated the tests. For now we allow D2M_SubgraphOp to fall back to DRAM.
Created an issue for allowing it to be part of an L1 chain.

I described how we can possibly modify the call site and special-case D2M_SubgraphOp to be part of L1 chain, but that's not included in this PR as you suggested:

// We can change the call site and special-case D2MSubgraphOp. For example, in ShardSolver.cpp
llvm::SmallVector<OpConfig> testConfigs;
if (llvm::isa<ttnn::D2MSubgraphOp>(consumerOp)) {
  for (const OpConfig &c : consumerConfigs) {
    testConfigs.push_back(c);
  }
} else {
  testConfigs = optimizer_utils::getUniqueTestConfigs(
      consumerConfigs, shouldUseIgnorePhysicalLayout(consumerOp));
}

Copy link

@xanderchin xanderchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level looks good from my perspective. @rpavlovicTT and @odjuricicTT to sign off on optimizer code changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Optimizer][D2M] Support D2M_SubgraphOp with Optimizer enabled and fall-back to DRAM

4 participants