Skip to content

[AIEPlacer] Unify objectfifo + flow placement through Adjacency#3055

Merged
hunhoffe merged 2 commits into
Xilinx:mainfrom
erwei-xilinx:placer-fifo-adjacency
May 8, 2026
Merged

[AIEPlacer] Unify objectfifo + flow placement through Adjacency#3055
hunhoffe merged 2 commits into
Xilinx:mainfrom
erwei-xilinx:placer-fifo-adjacency

Conversation

@erwei-xilinx
Copy link
Copy Markdown
Collaborator

@erwei-xilinx erwei-xilinx commented May 7, 2026

Replaces the parallel ConnectivityGroup / DFS-grouping path in the placer with the same Adjacency representation already used by buffer / cascade constraints (#3041, #3042, #3046). After this PR there is one canonical "tile-to-tile relation" representation across all sources: aie.buffer, aie.cascade_flow, aie.objectfifo, aie.flow / aie.packet_flow.

What changes

  • New buildObjectFifoAdjacency(objectFifos) -> Adjacency: emits one edge per (producer, consumer_i) pair. Linked fifos do not need a special case -- the link tile naturally appears as consumer of every source fifo and producer of every destination fifo, so per-fifo edge emission already connects all sibling endpoints transitively through that shared tile.
  • New buildFlowAdjacency(flows, pktFlows) -> Adjacency: one edge per aie.flow; per aie.packet_flow, cross-product of its aie.packet_sources and aie.packet_dests.
  • New placeNonCoreTileByCentroid: walks both adjacencies via BFS from the LTO, accumulating columns of placed CoreTile peers, places at the centroid (or pinned column).
  • Phase 4 + Phase 5 in place() collapse into a single iteration over unplaced non-core LTOs.
  • Small Adjacency::addEdgeFromValues(Value, Value) helper so both new builders share the dyn_cast_or_null<TileLike> step.

Removals

  • struct ConnectivityGroup
  • buildObjectFifoGroups, buildFlowGroups, placeNonCoreTilesInGroup

Behavior

Identical placements on all 9 existing test/place-tiles/ lit tests. Channel-requirements path is untouched -- it's a per-tile resource counter, fundamentally not a pairwise relation.

Net diff: +131 / -219 (-88 lines).

Copilot AI review requested due to automatic review settings May 7, 2026 21:47
@hunhoffe hunhoffe added this to the IRON 1.3.2 milestone May 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Unifies non-core tile placement for aie.objectfifo and aie.flow/aie.packet_flow by expressing all tile-to-tile connectivity through the existing Adjacency representation, and extends placement-time legality checks to include cross-tile aie.use_lock (lock affinity) using the same mem-affinity rule as shared-L1 buffers.

Changes:

  • Add lock-affinity adjacency: cross-tile aie.use_lock now constrains core placement via targetModel->isLegalMemAffinity(...).
  • Replace legacy ConnectivityGroup + DFS grouping for fifo/flow with Adjacency builders (buildObjectFifoAdjacency, buildFlowAdjacency) and a unified centroid-based placer for non-core tiles.
  • Add new lit coverage for lock adjacency (positive cases + new error cases).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
lib/Dialect/AIE/Transforms/AIEPlacer.cpp Adds lock adjacency checks; replaces ConnectivityGroup-based grouping with adjacency-driven centroid placement for non-core tiles.
include/aie/Dialect/AIE/Transforms/AIEPlacer.h Updates placer API: removes ConnectivityGroup, adds adjacency builders and centroid placement helper declarations.
test/place-tiles/sequential_placer/test_place_lock_adjacency.mlir New tests validating that lock adjacency steers placement and works across targets.
test/place-tiles/sequential_placer/test_place_errors.mlir Adds lock-adjacency negative tests and expected diagnostics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/Dialect/AIE/Transforms/AIEPlacer.cpp Outdated
Comment thread lib/Dialect/AIE/Transforms/AIEPlacer.cpp
@erwei-xilinx erwei-xilinx force-pushed the placer-fifo-adjacency branch from 1f2fd9a to c713081 Compare May 7, 2026 21:59
erwei-xilinx added a commit to erwei-xilinx/mlir-aie that referenced this pull request May 7, 2026
Addresses Copilot review feedback on PR Xilinx#3055: the per-LTO call to
`placeNonCoreTileByCentroid` was iterating its equivalence class via
`member_begin`...`member_end` to sum placed-core columns, so for N
non-core LTOs sharing one large class the total work was O(N x
class_size).

Pre-compute (sumCols, count) once per class leader before the Phase 4
loop. This is safe because Phase 3 has already placed every CoreTile
LTO, so `result`'s core entries are stable for the duration of Phase 4.
Per-LTO placement is now O(α(n)) findLeader + O(1) map lookup; no
per-LTO walk through the class members.

Behavior preserved -- centroid math is identical, just hoisted. All 10
placer lit tests pass with identical placements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx added a commit to erwei-xilinx/mlir-aie that referenced this pull request May 7, 2026
Addresses the Copilot review comment on PR Xilinx#3055: the original per-LTO
`placeNonCoreTileByCentroid` was BFS-walking the connectivity adjacencies
on every call, so for N non-core LTOs sharing one large component the
total work was O(N x (V+E)).

Hoist the walk: `precomputeConnectivityCentroidParts` flood-fills every
connected component once over the union of fifo + flow adjacencies,
records `(sumCols, count)` for each component's placed-`CoreTile` LTOs,
and writes that pair into a `DenseMap` keyed by every component member.
Phase 4's per-LTO call is then a hash-map lookup -- O(α(n)) effectively.

This is safe because Phase 3 has already placed every CoreTile LTO, so
`result`'s core entries are stable for the duration of Phase 4.

Also restores the `flow_chain_transitive_pinned_core` regression test:
shim is connected to a column-pinned core only via an unplaced memtile
in the middle, so reaching the core requires walking the connected
component (not just direct edges). Without that, the shim would land
in column 0 instead of the pinned core's column 2. Not exercised by the
existing `flow_chain_shim_mem_core` test (where all cores are placed
before Phase 4 starts).

Behavior preserved -- centroid math is identical, just hoisted. All 10
placer lit tests pass with identical placements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erwei-xilinx erwei-xilinx force-pushed the placer-fifo-adjacency branch 2 times, most recently from 4b443d1 to 248c802 Compare May 7, 2026 22:58
@erwei-xilinx erwei-xilinx changed the title [AIEPlacer] Unify fifo + flow placement through Adjacency (and lock adjacency) [AIEPlacer] Unify objectfifo + flow placement through Adjacency May 7, 2026
@erwei-xilinx erwei-xilinx force-pushed the placer-fifo-adjacency branch from 248c802 to f0c5948 Compare May 7, 2026 23:14
@erwei-xilinx
Copy link
Copy Markdown
Collaborator Author

Cc: @yenjames, @hunhoffe

erwei-xilinx and others added 2 commits May 7, 2026 16:20
Replaces the parallel `ConnectivityGroup` / DFS-grouping path with the
same `Adjacency` representation already used by buffer / lock / cascade
constraints (Xilinx#3041, Xilinx#3042, Xilinx#3046, prior commit on this branch).

## Motivation

Before this commit, the placer maintained two independent
"tile-to-tile relation" representations:

1. `Adjacency` -- introduced for buffer/cascade/lock affinity. Pairwise
   edges + indexed lookup. Used for legality predicates.
2. `ConnectivityGroup` + bespoke `MapVector<LT, SetVector<LT>>` adjacency
   + manual DFS in `buildFlowGroups`. Used for centroid placement of
   mem/shim tiles near their core peers.

These do the same job (encode "what tiles are related") in different
shapes. With Xilinx#3046 landed there's now exactly one canonical
representation; the legacy fifo path is the only thing left using the
parallel one.

## What changes

- New `buildObjectFifoAdjacency(objectFifos, objectFifoLinks) -> Adjacency`:
  emits one edge per `(producer, consumer_i)` pair. Linked fifos share
  an intermediate tile (link tile == consumer of every source fifo and
  producer of every destination fifo), so the natural edge emission
  already connects all sibling endpoints transitively through that
  shared tile -- no separate group-id machinery needed.
- New `buildFlowAdjacency(flows, pktFlows) -> Adjacency`: one edge per
  `aie.flow` `(src, dst)`; per `aie.packet_flow`, cross-product of its
  `aie.packet_source`s and `aie.packet_dest`s.
- New `placeNonCoreTileByCentroid(lt, adjacencies, channelRequirements)`:
  BFS through every supplied adjacency starting at `lt`, accumulating
  columns of placed `CoreTile` peers along the way; place at the
  rounded centroid (or the LTO's pinned column if set), respecting
  channel-requirement capacity. Walking through `LogicalTileOp` peers
  preserves the legacy ConnectivityGroup behaviour of seeing cores
  reachable transitively through intermediate mem/shim tiles.
- Phase 4 + Phase 5 in `place()` collapse into a single iteration over
  unplaced non-core LTOs, each calling the new placement function.

## Removals

- `struct ConnectivityGroup` (header)
- `buildObjectFifoGroups`, `buildFlowGroups`, `placeNonCoreTilesInGroup`
  (cpp + header)

## Behavior

Identical placements on all existing lit tests (10/10 in
`test/place-tiles/`, including the multi-fifo `edge_detect`,
linked-fifo, and flow-grouping cases). Channel-requirements path is
untouched -- it's a per-tile resource counter, fundamentally not a
pairwise relation, and rightfully stays as `DenseMap<Op*, pair<in, out>>`
outside Adjacency.

Net diff: -25 lines. The unified path is shorter than the parallel one
it replaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new BFS in `placeNonCoreTileByCentroid` was unconditionally enqueuing
`CoreTile` peers, so it transited through cores via core-to-core
objectfifos. In designs with chained core-to-core data flow (e.g.
`programming_examples/basic/vector_reduce_max/multi_column_designs/`),
every shim/mem LTO ended up with the same centroid (the average of all
cores in the chain), exhausting per-column DMA capacity and failing
placement with `no ShimNOCTile with sufficient DMA capacity`.

This restores the legacy `placeNonCoreTilesInGroup` invariant: cores
contribute their column to the centroid but do not relay between
non-core peers. Walking transitively through `LogicalTileOp` peers
still works for the design point that motivated it -- linked fifos
sharing an intermediate mem tile -- because the link tile is non-core
and stays enqueued.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hunhoffe hunhoffe added this pull request to the merge queue May 8, 2026
Merged via the queue into Xilinx:main with commit db1575c May 8, 2026
71 checks passed
@erwei-xilinx erwei-xilinx deleted the placer-fifo-adjacency branch May 8, 2026 16:41
erwei-xilinx added a commit to erwei-xilinx/mlir-air-erwei that referenced this pull request May 8, 2026
…inx#1567 Stage C Xilinx#4)

Two changes bundled because the proper Path B replacement (defer L2
memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the
mlir-aie bump first:

1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap
   optimization (~160 lines). Bucket placement now uses round-robin
   only. The l2_memtile_column_affinity.mlir test is updated to reflect
   the round-robin output (which matches the test's documented
   "Without column-affinity swaps" baseline).

2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes
   Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement
   through Adjacency), which gives the SequentialPlacer the flow-based
   memtile-placement mechanism that AIR will use as the proper Path B
   replacement.

## Behavioral impact

Workloads with strong column-affinity patterns may see worse memtile
placement (cross-column DMA routing) until the Path B follow-up lands.
The follow-up requires deferring placer invocation until after aie.flow
ops materialize and is tracked in Xilinx#1602.

## Why now

This unblocks Path B: with the placer's new flow-adjacency mechanism
available, the deletion can be paired with proper placer-driven
replacement in the follow-up PRs. Keeping the optimization in AIR
permanently would mean two competing memtile-placement strategies; the
RFC's direction is to centralize placement in mlir-aie.

Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures
unrelated, unchanged). clang-format-17 clean.

Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx added a commit to erwei-xilinx/mlir-air-erwei that referenced this pull request May 8, 2026
…inx#1567 Stage C Xilinx#4)

Two changes bundled because the proper Path B replacement (defer L2
memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the
mlir-aie bump first:

1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap
   optimization (~160 lines). Bucket placement now uses round-robin
   only. The l2_memtile_column_affinity.mlir test is updated to reflect
   the round-robin output (which matches the test's documented
   "Without column-affinity swaps" baseline).

2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes
   Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement
   through Adjacency), which gives the SequentialPlacer the flow-based
   memtile-placement mechanism that AIR will use as the proper Path B
   replacement.

## Behavioral impact

Workloads with strong column-affinity patterns may see worse memtile
placement (cross-column DMA routing) until the Path B follow-up lands.
The follow-up requires deferring placer invocation until after aie.flow
ops materialize and is tracked in Xilinx#1602.

## Why now

This unblocks Path B: with the placer's new flow-adjacency mechanism
available, the deletion can be paired with proper placer-driven
replacement in the follow-up PRs. Keeping the optimization in AIR
permanently would mean two competing memtile-placement strategies; the
RFC's direction is to centralize placement in mlir-aie.

Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures
unrelated, unchanged). clang-format-17 clean.

Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx added a commit to erwei-xilinx/mlir-air-erwei that referenced this pull request May 8, 2026
…inx#1567 Stage C Xilinx#4)

Two changes bundled because the proper Path B replacement (defer L2
memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the
mlir-aie bump first:

1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap
   optimization (~160 lines). Bucket placement now uses round-robin
   only. The l2_memtile_column_affinity.mlir test is updated to reflect
   the round-robin output (which matches the test's documented
   "Without column-affinity swaps" baseline).

2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes
   Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement
   through Adjacency), which gives the SequentialPlacer the flow-based
   memtile-placement mechanism that AIR will use as the proper Path B
   replacement.

## Behavioral impact

Workloads with strong column-affinity patterns may see worse memtile
placement (cross-column DMA routing) until the Path B follow-up lands.
The follow-up requires deferring placer invocation until after aie.flow
ops materialize and is tracked in Xilinx#1602.

## Why now

This unblocks Path B: with the placer's new flow-adjacency mechanism
available, the deletion can be paired with proper placer-driven
replacement in the follow-up PRs. Keeping the optimization in AIR
permanently would mean two competing memtile-placement strategies; the
RFC's direction is to centralize placement in mlir-aie.

Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures
unrelated, unchanged). clang-format-17 clean.

Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx added a commit to erwei-xilinx/mlir-air-erwei that referenced this pull request May 8, 2026
…inx#1567 Stage C Xilinx#4)

Two changes bundled because the proper Path B replacement (defer L2
memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the
mlir-aie bump first:

1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap
   optimization (~160 lines). Bucket placement now uses round-robin
   only. The l2_memtile_column_affinity.mlir test is updated to reflect
   the round-robin output (which matches the test's documented
   "Without column-affinity swaps" baseline).

2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes
   Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement
   through Adjacency), which gives the SequentialPlacer the flow-based
   memtile-placement mechanism that AIR will use as the proper Path B
   replacement.

## Behavioral impact

Workloads with strong column-affinity patterns may see worse memtile
placement (cross-column DMA routing) until the Path B follow-up lands.
The follow-up requires deferring placer invocation until after aie.flow
ops materialize and is tracked in Xilinx#1602.

## Why now

This unblocks Path B: with the placer's new flow-adjacency mechanism
available, the deletion can be paired with proper placer-driven
replacement in the follow-up PRs. Keeping the optimization in AIR
permanently would mean two competing memtile-placement strategies; the
RFC's direction is to centralize placement in mlir-aie.

Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures
unrelated, unchanged). clang-format-17 clean.

Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erwei-xilinx added a commit to erwei-xilinx/mlir-air-erwei that referenced this pull request May 8, 2026
…inx#1567 Stage C Xilinx#4) (Xilinx#1603)

Two changes bundled because the proper Path B replacement (defer L2
memtile placement to mlir-aie's flow-aware SequentialPlacer) needs the
mlir-aie bump first:

1. Delete L2MemrefToMemTileMap's column-affinity 3rd-stage swap
   optimization (~160 lines). Bucket placement now uses round-robin
   only. The l2_memtile_column_affinity.mlir test is updated to reflect
   the round-robin output (which matches the test's documented
   "Without column-affinity swaps" baseline).

2. Bump mlir-aie pin from 52dd9bc to db1575c. The new mlir-aie includes
   Xilinx/mlir-aie#3055 ([AIEPlacer] Unify objectfifo + flow placement
   through Adjacency), which gives the SequentialPlacer the flow-based
   memtile-placement mechanism that AIR will use as the proper Path B
   replacement.

## Behavioral impact

Workloads with strong column-affinity patterns may see worse memtile
placement (cross-column DMA routing) until the Path B follow-up lands.
The follow-up requires deferring placer invocation until after aie.flow
ops materialize and is tracked in Xilinx#1602.

## Why now

This unblocks Path B: with the placer's new flow-adjacency mechanism
available, the deletion can be paired with proper placer-driven
replacement in the follow-up PRs. Keeping the optimization in AIR
permanently would mean two competing memtile-placement strategies; the
RFC's direction is to centralize placement in mlir-aie.

Tests: 384/384 check-air-mlir (2 pre-existing AIRToROCDL failures
unrelated, unchanged). clang-format-17 clean.

Stacks on Xilinx#1597, Xilinx#1598, Xilinx#1599, Xilinx#1601.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants