[AIEPlacer] Unify objectfifo + flow placement through Adjacency#3055
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Unifies non-core tile placement for aie.objectfifo and aie.flow/aie.packet_flow by expressing all tile-to-tile connectivity through the existing Adjacency representation, and extends placement-time legality checks to include cross-tile aie.use_lock (lock affinity) using the same mem-affinity rule as shared-L1 buffers.
Changes:
- Add lock-affinity adjacency: cross-tile
aie.use_locknow constrains core placement viatargetModel->isLegalMemAffinity(...). - Replace legacy
ConnectivityGroup+ DFS grouping for fifo/flow withAdjacencybuilders (buildObjectFifoAdjacency,buildFlowAdjacency) and a unified centroid-based placer for non-core tiles. - Add new lit coverage for lock adjacency (positive cases + new error cases).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
lib/Dialect/AIE/Transforms/AIEPlacer.cpp |
Adds lock adjacency checks; replaces ConnectivityGroup-based grouping with adjacency-driven centroid placement for non-core tiles. |
include/aie/Dialect/AIE/Transforms/AIEPlacer.h |
Updates placer API: removes ConnectivityGroup, adds adjacency builders and centroid placement helper declarations. |
test/place-tiles/sequential_placer/test_place_lock_adjacency.mlir |
New tests validating that lock adjacency steers placement and works across targets. |
test/place-tiles/sequential_placer/test_place_errors.mlir |
Adds lock-adjacency negative tests and expected diagnostics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1f2fd9a to
c713081
Compare
erwei-xilinx
added a commit
to erwei-xilinx/mlir-aie
that referenced
this pull request
May 7, 2026
Addresses Copilot review feedback on PR Xilinx#3055: the per-LTO call to `placeNonCoreTileByCentroid` was iterating its equivalence class via `member_begin`...`member_end` to sum placed-core columns, so for N non-core LTOs sharing one large class the total work was O(N x class_size). Pre-compute (sumCols, count) once per class leader before the Phase 4 loop. This is safe because Phase 3 has already placed every CoreTile LTO, so `result`'s core entries are stable for the duration of Phase 4. Per-LTO placement is now O(α(n)) findLeader + O(1) map lookup; no per-LTO walk through the class members. Behavior preserved -- centroid math is identical, just hoisted. All 10 placer lit tests pass with identical placements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx
added a commit
to erwei-xilinx/mlir-aie
that referenced
this pull request
May 7, 2026
Addresses the Copilot review comment on PR Xilinx#3055: the original per-LTO `placeNonCoreTileByCentroid` was BFS-walking the connectivity adjacencies on every call, so for N non-core LTOs sharing one large component the total work was O(N x (V+E)). Hoist the walk: `precomputeConnectivityCentroidParts` flood-fills every connected component once over the union of fifo + flow adjacencies, records `(sumCols, count)` for each component's placed-`CoreTile` LTOs, and writes that pair into a `DenseMap` keyed by every component member. Phase 4's per-LTO call is then a hash-map lookup -- O(α(n)) effectively. This is safe because Phase 3 has already placed every CoreTile LTO, so `result`'s core entries are stable for the duration of Phase 4. Also restores the `flow_chain_transitive_pinned_core` regression test: shim is connected to a column-pinned core only via an unplaced memtile in the middle, so reaching the core requires walking the connected component (not just direct edges). Without that, the shim would land in column 0 instead of the pinned core's column 2. Not exercised by the existing `flow_chain_shim_mem_core` test (where all cores are placed before Phase 4 starts). Behavior preserved -- centroid math is identical, just hoisted. All 10 placer lit tests pass with identical placements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4b443d1 to
248c802
Compare
248c802 to
f0c5948
Compare
Collaborator
Author
Replaces the parallel `ConnectivityGroup` / DFS-grouping path with the same `Adjacency` representation already used by buffer / lock / cascade constraints (Xilinx#3041, Xilinx#3042, Xilinx#3046, prior commit on this branch). ## Motivation Before this commit, the placer maintained two independent "tile-to-tile relation" representations: 1. `Adjacency` -- introduced for buffer/cascade/lock affinity. Pairwise edges + indexed lookup. Used for legality predicates. 2. `ConnectivityGroup` + bespoke `MapVector<LT, SetVector<LT>>` adjacency + manual DFS in `buildFlowGroups`. Used for centroid placement of mem/shim tiles near their core peers. These do the same job (encode "what tiles are related") in different shapes. With Xilinx#3046 landed there's now exactly one canonical representation; the legacy fifo path is the only thing left using the parallel one. ## What changes - New `buildObjectFifoAdjacency(objectFifos, objectFifoLinks) -> Adjacency`: emits one edge per `(producer, consumer_i)` pair. Linked fifos share an intermediate tile (link tile == consumer of every source fifo and producer of every destination fifo), so the natural edge emission already connects all sibling endpoints transitively through that shared tile -- no separate group-id machinery needed. - New `buildFlowAdjacency(flows, pktFlows) -> Adjacency`: one edge per `aie.flow` `(src, dst)`; per `aie.packet_flow`, cross-product of its `aie.packet_source`s and `aie.packet_dest`s. - New `placeNonCoreTileByCentroid(lt, adjacencies, channelRequirements)`: BFS through every supplied adjacency starting at `lt`, accumulating columns of placed `CoreTile` peers along the way; place at the rounded centroid (or the LTO's pinned column if set), respecting channel-requirement capacity. Walking through `LogicalTileOp` peers preserves the legacy ConnectivityGroup behaviour of seeing cores reachable transitively through intermediate mem/shim tiles. - Phase 4 + Phase 5 in `place()` collapse into a single iteration over unplaced non-core LTOs, each calling the new placement function. ## Removals - `struct ConnectivityGroup` (header) - `buildObjectFifoGroups`, `buildFlowGroups`, `placeNonCoreTilesInGroup` (cpp + header) ## Behavior Identical placements on all existing lit tests (10/10 in `test/place-tiles/`, including the multi-fifo `edge_detect`, linked-fifo, and flow-grouping cases). Channel-requirements path is untouched -- it's a per-tile resource counter, fundamentally not a pairwise relation, and rightfully stays as `DenseMap<Op*, pair<in, out>>` outside Adjacency. Net diff: -25 lines. The unified path is shorter than the parallel one it replaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new BFS in `placeNonCoreTileByCentroid` was unconditionally enqueuing `CoreTile` peers, so it transited through cores via core-to-core objectfifos. In designs with chained core-to-core data flow (e.g. `programming_examples/basic/vector_reduce_max/multi_column_designs/`), every shim/mem LTO ended up with the same centroid (the average of all cores in the chain), exhausting per-column DMA capacity and failing placement with `no ShimNOCTile with sufficient DMA capacity`. This restores the legacy `placeNonCoreTilesInGroup` invariant: cores contribute their column to the centroid but do not relay between non-core peers. Walking transitively through `LogicalTileOp` peers still works for the design point that motivated it -- linked fifos sharing an intermediate mem tile -- because the link tile is non-core and stays enqueued. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hunhoffe
approved these changes
May 8, 2026
erwei-xilinx
added a commit
to erwei-xilinx/mlir-air-erwei
that referenced
this pull request
May 8, 2026
…inx#1567 Stage C Xilinx#4) Two changes bundled because the proper Path B replacement (defer L2 memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the mlir-aie bump first: 1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap optimization (~160 lines). Bucket placement now uses round-robin only. The l2_memtile_column_affinity.mlir test is updated to reflect the round-robin output (which matches the test's documented "Without column-affinity swaps" baseline). 2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement through Adjacency), which gives the SequentialPlacer the flow-based memtile-placement mechanism that AIR will use as the proper Path B replacement. ## Behavioral impact Workloads with strong column-affinity patterns may see worse memtile placement (cross-column DMA routing) until the Path B follow-up lands. The follow-up requires deferring placer invocation until after aie.flow ops materialize and is tracked in Xilinx#1602. ## Why now This unblocks Path B: with the placer's new flow-adjacency mechanism available, the deletion can be paired with proper placer-driven replacement in the follow-up PRs. Keeping the optimization in AIR permanently would mean two competing memtile-placement strategies; the RFC's direction is to centralize placement in mlir-aie. Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures unrelated, unchanged). clang-format-17 clean. Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx
added a commit
to erwei-xilinx/mlir-air-erwei
that referenced
this pull request
May 8, 2026
…inx#1567 Stage C Xilinx#4) Two changes bundled because the proper Path B replacement (defer L2 memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the mlir-aie bump first: 1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap optimization (~160 lines). Bucket placement now uses round-robin only. The l2_memtile_column_affinity.mlir test is updated to reflect the round-robin output (which matches the test's documented "Without column-affinity swaps" baseline). 2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement through Adjacency), which gives the SequentialPlacer the flow-based memtile-placement mechanism that AIR will use as the proper Path B replacement. ## Behavioral impact Workloads with strong column-affinity patterns may see worse memtile placement (cross-column DMA routing) until the Path B follow-up lands. The follow-up requires deferring placer invocation until after aie.flow ops materialize and is tracked in Xilinx#1602. ## Why now This unblocks Path B: with the placer's new flow-adjacency mechanism available, the deletion can be paired with proper placer-driven replacement in the follow-up PRs. Keeping the optimization in AIR permanently would mean two competing memtile-placement strategies; the RFC's direction is to centralize placement in mlir-aie. Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures unrelated, unchanged). clang-format-17 clean. Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx
added a commit
to erwei-xilinx/mlir-air-erwei
that referenced
this pull request
May 8, 2026
…inx#1567 Stage C Xilinx#4) Two changes bundled because the proper Path B replacement (defer L2 memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the mlir-aie bump first: 1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap optimization (~160 lines). Bucket placement now uses round-robin only. The l2_memtile_column_affinity.mlir test is updated to reflect the round-robin output (which matches the test's documented "Without column-affinity swaps" baseline). 2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement through Adjacency), which gives the SequentialPlacer the flow-based memtile-placement mechanism that AIR will use as the proper Path B replacement. ## Behavioral impact Workloads with strong column-affinity patterns may see worse memtile placement (cross-column DMA routing) until the Path B follow-up lands. The follow-up requires deferring placer invocation until after aie.flow ops materialize and is tracked in Xilinx#1602. ## Why now This unblocks Path B: with the placer's new flow-adjacency mechanism available, the deletion can be paired with proper placer-driven replacement in the follow-up PRs. Keeping the optimization in AIR permanently would mean two competing memtile-placement strategies; the RFC's direction is to centralize placement in mlir-aie. Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures unrelated, unchanged). clang-format-17 clean. Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx
added a commit
to erwei-xilinx/mlir-air-erwei
that referenced
this pull request
May 8, 2026
…inx#1567 Stage C Xilinx#4) Two changes bundled because the proper Path B replacement (defer L2 memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the mlir-aie bump first: 1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap optimization (~160 lines). Bucket placement now uses round-robin only. The l2_memtile_column_affinity.mlir test is updated to reflect the round-robin output (which matches the test's documented "Without column-affinity swaps" baseline). 2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement through Adjacency), which gives the SequentialPlacer the flow-based memtile-placement mechanism that AIR will use as the proper Path B replacement. ## Behavioral impact Workloads with strong column-affinity patterns may see worse memtile placement (cross-column DMA routing) until the Path B follow-up lands. The follow-up requires deferring placer invocation until after aie.flow ops materialize and is tracked in Xilinx#1602. ## Why now This unblocks Path B: with the placer's new flow-adjacency mechanism available, the deletion can be paired with proper placer-driven replacement in the follow-up PRs. Keeping the optimization in AIR permanently would mean two competing memtile-placement strategies; the RFC's direction is to centralize placement in mlir-aie. Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures unrelated, unchanged). clang-format-17 clean. Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx
added a commit
to erwei-xilinx/mlir-air-erwei
that referenced
this pull request
May 8, 2026
…inx#1567 Stage C Xilinx#4) (Xilinx#1603) Two changes bundled because the proper Path B replacement (defer L2 memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the mlir-aie bump first: 1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap optimization (~160 lines). Bucket placement now uses round-robin only. The l2_memtile_column_affinity.mlir test is updated to reflect the round-robin output (which matches the test's documented "Without column-affinity swaps" baseline). 2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement through Adjacency), which gives the SequentialPlacer the flow-based memtile-placement mechanism that AIR will use as the proper Path B replacement. ## Behavioral impact Workloads with strong column-affinity patterns may see worse memtile placement (cross-column DMA routing) until the Path B follow-up lands. The follow-up requires deferring placer invocation until after aie.flow ops materialize and is tracked in Xilinx#1602. ## Why now This unblocks Path B: with the placer's new flow-adjacency mechanism available, the deletion can be paired with proper placer-driven replacement in the follow-up PRs. Keeping the optimization in AIR permanently would mean two competing memtile-placement strategies; the RFC's direction is to centralize placement in mlir-aie. Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures unrelated, unchanged). clang-format-17 clean. Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces the parallel
ConnectivityGroup/ DFS-grouping path in the placer with the sameAdjacencyrepresentation already used by buffer / cascade constraints (#3041, #3042, #3046). After this PR there is one canonical "tile-to-tile relation" representation across all sources:aie.buffer,aie.cascade_flow,aie.objectfifo,aie.flow/aie.packet_flow.What changes
buildObjectFifoAdjacency(objectFifos)->Adjacency: emits one edge per(producer, consumer_i)pair. Linked fifos do not need a special case -- the link tile naturally appears as consumer of every source fifo and producer of every destination fifo, so per-fifo edge emission already connects all sibling endpoints transitively through that shared tile.buildFlowAdjacency(flows, pktFlows)->Adjacency: one edge peraie.flow; peraie.packet_flow, cross-product of itsaie.packet_sources andaie.packet_dests.placeNonCoreTileByCentroid: walks both adjacencies via BFS from the LTO, accumulating columns of placedCoreTilepeers, places at the centroid (or pinned column).place()collapse into a single iteration over unplaced non-core LTOs.Adjacency::addEdgeFromValues(Value, Value)helper so both new builders share thedyn_cast_or_null<TileLike>step.Removals
struct ConnectivityGroupbuildObjectFifoGroups,buildFlowGroups,placeNonCoreTilesInGroupBehavior
Identical placements on all 9 existing
test/place-tiles/lit tests. Channel-requirements path is untouched -- it's a per-tile resource counter, fundamentally not a pairwise relation.Net diff: +131 / -219 (-88 lines).