[AIEPlacer] Unify objectfifo + flow placement through Adjacency by erwei-xilinx · Pull Request #3055 · Xilinx/mlir-aie

erwei-xilinx · 2026-05-07T21:47:09Z

Replaces the parallel ConnectivityGroup / DFS-grouping path in the placer with the same Adjacency representation already used by buffer / cascade constraints (#3041, #3042, #3046). After this PR there is one canonical "tile-to-tile relation" representation across all sources: aie.buffer, aie.cascade_flow, aie.objectfifo, aie.flow / aie.packet_flow.

What changes

New buildObjectFifoAdjacency(objectFifos) -> Adjacency: emits one edge per (producer, consumer_i) pair. Linked fifos do not need a special case -- the link tile naturally appears as consumer of every source fifo and producer of every destination fifo, so per-fifo edge emission already connects all sibling endpoints transitively through that shared tile.
New buildFlowAdjacency(flows, pktFlows) -> Adjacency: one edge per aie.flow; per aie.packet_flow, cross-product of its aie.packet_sources and aie.packet_dests.
New placeNonCoreTileByCentroid: walks both adjacencies via BFS from the LTO, accumulating columns of placed CoreTile peers, places at the centroid (or pinned column).
Phase 4 + Phase 5 in place() collapse into a single iteration over unplaced non-core LTOs.
Small Adjacency::addEdgeFromValues(Value, Value) helper so both new builders share the dyn_cast_or_null<TileLike> step.

Removals

struct ConnectivityGroup
buildObjectFifoGroups, buildFlowGroups, placeNonCoreTilesInGroup

Behavior

Identical placements on all 9 existing test/place-tiles/ lit tests. Channel-requirements path is untouched -- it's a per-tile resource counter, fundamentally not a pairwise relation.

Net diff: +131 / -219 (-88 lines).

Copilot

Pull request overview

Unifies non-core tile placement for aie.objectfifo and aie.flow/aie.packet_flow by expressing all tile-to-tile connectivity through the existing Adjacency representation, and extends placement-time legality checks to include cross-tile aie.use_lock (lock affinity) using the same mem-affinity rule as shared-L1 buffers.

Changes:

Add lock-affinity adjacency: cross-tile aie.use_lock now constrains core placement via targetModel->isLegalMemAffinity(...).
Replace legacy ConnectivityGroup + DFS grouping for fifo/flow with Adjacency builders (buildObjectFifoAdjacency, buildFlowAdjacency) and a unified centroid-based placer for non-core tiles.
Add new lit coverage for lock adjacency (positive cases + new error cases).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`lib/Dialect/AIE/Transforms/AIEPlacer.cpp`	Adds lock adjacency checks; replaces ConnectivityGroup-based grouping with adjacency-driven centroid placement for non-core tiles.
`include/aie/Dialect/AIE/Transforms/AIEPlacer.h`	Updates placer API: removes `ConnectivityGroup`, adds adjacency builders and centroid placement helper declarations.
`test/place-tiles/sequential_placer/test_place_lock_adjacency.mlir`	New tests validating that lock adjacency steers placement and works across targets.
`test/place-tiles/sequential_placer/test_place_errors.mlir`	Adds lock-adjacency negative tests and expected diagnostics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Addresses Copilot review feedback on PR Xilinx#3055: the per-LTO call to `placeNonCoreTileByCentroid` was iterating its equivalence class via `member_begin`...`member_end` to sum placed-core columns, so for N non-core LTOs sharing one large class the total work was O(N x class_size). Pre-compute (sumCols, count) once per class leader before the Phase 4 loop. This is safe because Phase 3 has already placed every CoreTile LTO, so `result`'s core entries are stable for the duration of Phase 4. Per-LTO placement is now O(α(n)) findLeader + O(1) map lookup; no per-LTO walk through the class members. Behavior preserved -- centroid math is identical, just hoisted. All 10 placer lit tests pass with identical placements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses the Copilot review comment on PR Xilinx#3055: the original per-LTO `placeNonCoreTileByCentroid` was BFS-walking the connectivity adjacencies on every call, so for N non-core LTOs sharing one large component the total work was O(N x (V+E)). Hoist the walk: `precomputeConnectivityCentroidParts` flood-fills every connected component once over the union of fifo + flow adjacencies, records `(sumCols, count)` for each component's placed-`CoreTile` LTOs, and writes that pair into a `DenseMap` keyed by every component member. Phase 4's per-LTO call is then a hash-map lookup -- O(α(n)) effectively. This is safe because Phase 3 has already placed every CoreTile LTO, so `result`'s core entries are stable for the duration of Phase 4. Also restores the `flow_chain_transitive_pinned_core` regression test: shim is connected to a column-pinned core only via an unplaced memtile in the middle, so reaching the core requires walking the connected component (not just direct edges). Without that, the shim would land in column 0 instead of the pinned core's column 2. Not exercised by the existing `flow_chain_shim_mem_core` test (where all cores are placed before Phase 4 starts). Behavior preserved -- centroid math is identical, just hoisted. All 10 placer lit tests pass with identical placements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erwei-xilinx · 2026-05-07T23:19:02Z

Cc: @yenjames, @hunhoffe

Replaces the parallel `ConnectivityGroup` / DFS-grouping path with the same `Adjacency` representation already used by buffer / lock / cascade constraints (Xilinx#3041, Xilinx#3042, Xilinx#3046, prior commit on this branch). ## Motivation Before this commit, the placer maintained two independent "tile-to-tile relation" representations: 1. `Adjacency` -- introduced for buffer/cascade/lock affinity. Pairwise edges + indexed lookup. Used for legality predicates. 2. `ConnectivityGroup` + bespoke `MapVector<LT, SetVector<LT>>` adjacency + manual DFS in `buildFlowGroups`. Used for centroid placement of mem/shim tiles near their core peers. These do the same job (encode "what tiles are related") in different shapes. With Xilinx#3046 landed there's now exactly one canonical representation; the legacy fifo path is the only thing left using the parallel one. ## What changes - New `buildObjectFifoAdjacency(objectFifos, objectFifoLinks) -> Adjacency`: emits one edge per `(producer, consumer_i)` pair. Linked fifos share an intermediate tile (link tile == consumer of every source fifo and producer of every destination fifo), so the natural edge emission already connects all sibling endpoints transitively through that shared tile -- no separate group-id machinery needed. - New `buildFlowAdjacency(flows, pktFlows) -> Adjacency`: one edge per `aie.flow` `(src, dst)`; per `aie.packet_flow`, cross-product of its `aie.packet_source`s and `aie.packet_dest`s. - New `placeNonCoreTileByCentroid(lt, adjacencies, channelRequirements)`: BFS through every supplied adjacency starting at `lt`, accumulating columns of placed `CoreTile` peers along the way; place at the rounded centroid (or the LTO's pinned column if set), respecting channel-requirement capacity. Walking through `LogicalTileOp` peers preserves the legacy ConnectivityGroup behaviour of seeing cores reachable transitively through intermediate mem/shim tiles. - Phase 4 + Phase 5 in `place()` collapse into a single iteration over unplaced non-core LTOs, each calling the new placement function. ## Removals - `struct ConnectivityGroup` (header) - `buildObjectFifoGroups`, `buildFlowGroups`, `placeNonCoreTilesInGroup` (cpp + header) ## Behavior Identical placements on all existing lit tests (10/10 in `test/place-tiles/`, including the multi-fifo `edge_detect`, linked-fifo, and flow-grouping cases). Channel-requirements path is untouched -- it's a per-tile resource counter, fundamentally not a pairwise relation, and rightfully stays as `DenseMap<Op*, pair<in, out>>` outside Adjacency. Net diff: -25 lines. The unified path is shorter than the parallel one it replaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The new BFS in `placeNonCoreTileByCentroid` was unconditionally enqueuing `CoreTile` peers, so it transited through cores via core-to-core objectfifos. In designs with chained core-to-core data flow (e.g. `programming_examples/basic/vector_reduce_max/multi_column_designs/`), every shim/mem LTO ended up with the same centroid (the average of all cores in the chain), exhausting per-column DMA capacity and failing placement with `no ShimNOCTile with sufficient DMA capacity`. This restores the legacy `placeNonCoreTilesInGroup` invariant: cores contribute their column to the centroid but do not relay between non-core peers. Walking transitively through `LogicalTileOp` peers still works for the design point that motivated it -- linked fifos sharing an intermediate mem tile -- because the link tile is non-core and stays enqueued. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…inx#1567 Stage C Xilinx#4) Two changes bundled because the proper Path B replacement (defer L2 memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the mlir-aie bump first: 1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap optimization (~160 lines). Bucket placement now uses round-robin only. The l2_memtile_column_affinity.mlir test is updated to reflect the round-robin output (which matches the test's documented "Without column-affinity swaps" baseline). 2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement through Adjacency), which gives the SequentialPlacer the flow-based memtile-placement mechanism that AIR will use as the proper Path B replacement. ## Behavioral impact Workloads with strong column-affinity patterns may see worse memtile placement (cross-column DMA routing) until the Path B follow-up lands. The follow-up requires deferring placer invocation until after aie.flow ops materialize and is tracked in Xilinx#1602. ## Why now This unblocks Path B: with the placer's new flow-adjacency mechanism available, the deletion can be paired with proper placer-driven replacement in the follow-up PRs. Keeping the optimization in AIR permanently would mean two competing memtile-placement strategies; the RFC's direction is to centralize placement in mlir-aie. Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures unrelated, unchanged). clang-format-17 clean. Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…inx#1567 Stage C Xilinx#4) (Xilinx#1603) Two changes bundled because the proper Path B replacement (defer L2 memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the mlir-aie bump first: 1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap optimization (~160 lines). Bucket placement now uses round-robin only. The l2_memtile_column_affinity.mlir test is updated to reflect the round-robin output (which matches the test's documented "Without column-affinity swaps" baseline). 2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement through Adjacency), which gives the SequentialPlacer the flow-based memtile-placement mechanism that AIR will use as the proper Path B replacement. ## Behavioral impact Workloads with strong column-affinity patterns may see worse memtile placement (cross-column DMA routing) until the Path B follow-up lands. The follow-up requires deferring placer invocation until after aie.flow ops materialize and is tracked in Xilinx#1602. ## Why now This unblocks Path B: with the placer's new flow-adjacency mechanism available, the deletion can be paired with proper placer-driven replacement in the follow-up PRs. Keeping the optimization in AIR permanently would mean two competing memtile-placement strategies; the RFC's direction is to centralize placement in mlir-aie. Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures unrelated, unchanged). clang-format-17 clean. Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 7, 2026 21:47

erwei-xilinx requested review from abisca, fifield, hunhoffe, jackl-xilinx, jgmelber and stephenneuendorffer as code owners May 7, 2026 21:47

Copilot started reviewing on behalf of erwei-xilinx May 7, 2026 21:49 View session

hunhoffe added this to the IRON 1.3.2 milestone May 7, 2026

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread lib/Dialect/AIE/Transforms/AIEPlacer.cpp Outdated

Comment thread lib/Dialect/AIE/Transforms/AIEPlacer.cpp

erwei-xilinx force-pushed the placer-fifo-adjacency branch from 1f2fd9a to c713081 Compare May 7, 2026 21:59

erwei-xilinx force-pushed the placer-fifo-adjacency branch 2 times, most recently from 4b443d1 to 248c802 Compare May 7, 2026 22:58

erwei-xilinx changed the title ~~[AIEPlacer] Unify fifo + flow placement through Adjacency (and lock adjacency)~~ [AIEPlacer] Unify objectfifo + flow placement through Adjacency May 7, 2026

erwei-xilinx force-pushed the placer-fifo-adjacency branch from 248c802 to f0c5948 Compare May 7, 2026 23:14

erwei-xilinx and others added 2 commits May 7, 2026 16:20

hunhoffe approved these changes May 8, 2026

View reviewed changes

hunhoffe added this pull request to the merge queue May 8, 2026

Merged via the queue into Xilinx:main with commit db1575c May 8, 2026
71 checks passed

erwei-xilinx deleted the placer-fifo-adjacency branch May 8, 2026 16:41

This was referenced May 8, 2026

[RFC #1567] Stage C #4 follow-up: defer L2 memtile placement to placer (path B) Xilinx/mlir-air#1602

Open

Delete memtile column-affinity opt + bump mlir-aie to db1575c (RFC #1567 Stage C #4) Xilinx/mlir-air#1603

Merged

erwei-xilinx mentioned this pull request May 8, 2026

[RFC] Migrate AIR to emit aie.logical_tile and remove placement-aware logic Xilinx/mlir-air#1567

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIEPlacer] Unify objectfifo + flow placement through Adjacency#3055

[AIEPlacer] Unify objectfifo + flow placement through Adjacency#3055
hunhoffe merged 2 commits into
Xilinx:mainfrom
erwei-xilinx:placer-fifo-adjacency

erwei-xilinx commented May 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

erwei-xilinx commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

erwei-xilinx commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes

Removals

Behavior

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

erwei-xilinx commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erwei-xilinx commented May 7, 2026 •

edited

Loading