Skip to content

Commit 1e6a826

Browse files
erwei-xilinxclaude
andcommitted
[Path B 4/7] AIR emits logical shim/memtiles end-to-end
Central refactor: AIR no longer calls a placer anywhere in its lowering. Memtiles and shim DMA tiles are emitted as aie.logical_tile<...>(...) and survive into the downstream pipeline; mlir-aie's `aie-place-tiles` pass (invoked from aircc in the next commit) picks physical coordinates from the full IR (flow adjacency, buffer adjacency, cascade adjacency, channel-budget capacity) rather than from per-allocation hints chosen by AIR. Fixes the failure mode that broke Xilinx#1605 in isolation: that PR removed the same-column heuristic from ShimDMAAllocator but kept calling createTileViaPlacer per-allocation, which placed each shim tile in isolation against an empty IR (no flows yet) and uniformly fell through to col 0. The placer's flow-aware logic never had a chance to fire. This commit deletes the per-allocation placer call entirely. Changes: - outlineAIEMemtiles: emit aie.logical_tile<MemTile>(col_hint, ?) directly. - ShimDMAAllocator::allocNewDmaChannel: emit aie.logical_tile<ShimNOCTile> with no col/row hint. Round-robin channel-index assignment. Subsumes the deletion of `colAllocConstraint == "same_column"` (Xilinx#1605); the parameter is gone from the API. - ShimDMAAllocator: drop dma_columns vector. - outlineAIECores: switch to direct getPhysTileOp (cores stay physical). - Delete createTileViaPlacer / createTilesViaPlacer entirely. - generateDmaBdProgram, generateDmaBd, getShimDMAOp, getMemTileDMAOp, labelMemcpyOpsWithPacketFlow: switch tile parameters to AIE::TileLike (or mlir::Value where the downstream API requires). - allocateAirRtMetadata: writes -1 for the shim "location" field when the shim tile is unplaced; commit 6 adds a fixup after aie-place-tiles. Lit failures expected (31 tests - all CHECK on old physical shim/memtile shape; migrated in commit 7). Build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5dc1c18 commit 1e6a826

3 files changed

Lines changed: 153 additions & 231 deletions

File tree

mlir/include/air/Conversion/AIRToAIESchedulingUtils.h

Lines changed: 16 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -28,30 +28,6 @@ AIE::TileOp getPhysTileOpOrNull(AIE::DeviceOp aie_device, int col, int row);
2828
// get tileop using physical coordinates
2929
AIE::TileOp getPhysTileOp(AIE::DeviceOp aie_device, int col, int row);
3030

31-
// Materialize a physical aie.tile by emitting an aie.logical_tile<tileType>
32-
// with the given hints (use std::nullopt for "?"), running mlir-aie's
33-
// SequentialPlacer, and resolving the result through getPhysTileOp. On
34-
// placement failure, emits a diagnostic on `aie_device` and returns failure.
35-
//
36-
// Caller must NOT be inside a greedy PatternRewriter callback; this helper
37-
// uses plain OpBuilder + replaceAllUsesWith/erase, which would invalidate
38-
// a greedy worklist's cached use-def edges (see RFC #1567 milestone 2).
39-
mlir::FailureOr<AIE::TileOp> createTileViaPlacer(AIE::DeviceOp aie_device,
40-
AIE::AIETileType tileType,
41-
std::optional<int> col_hint,
42-
std::optional<int> row_hint);
43-
44-
// Batched variant: emits N aie.logical_tile<tileType> ops (one per hint),
45-
// runs the placer ONCE, and resolves each into a physical aie.tile. The
46-
// returned vector parallels `hints`. Use this when multiple unconstrained
47-
// or partially-constrained logical tiles must be placed together — e.g.,
48-
// a herd of cores all asking (col, ?), which a per-tile placer would all
49-
// map to the same row because state doesn't persist across place() calls.
50-
mlir::LogicalResult createTilesViaPlacer(
51-
AIE::DeviceOp aie_device, AIE::AIETileType tileType,
52-
llvm::ArrayRef<std::pair<std::optional<int>, std::optional<int>>> hints,
53-
llvm::SmallVectorImpl<AIE::TileOp> &outTiles);
54-
5531
AIE::LockOp allocateLockOp(AIE::DeviceOp aie_device, AIE::TileLike tile,
5632
int init = 0, int id = -1,
5733
StringAttr name = nullptr);
@@ -207,15 +183,28 @@ class TileDMAAllocator : public DMAAllocator {
207183
class ShimDMAAllocator : public DMAAllocator {
208184

209185
public:
210-
std::vector<int> dma_columns;
186+
// Per-shim DMA channel count (2 MM2S + 2 S2MM on all current targets).
187+
// Used by allocNewDmaChannel for round-robin channel-index assignment;
188+
// the placer's per-tile DMA channel budget then spreads logical shim
189+
// tiles across physical shim columns so channel demand per column is
190+
// honored.
211191
int shim_dma_channels;
212192

213193
ShimDMAAllocator(AIE::DeviceOp device);
214194

195+
// Allocate a new shim DMA channel. The shim tile is emitted as an
196+
// unconstrained aie.logical_tile<ShimNOCTile>(?, ?); mlir-aie's
197+
// aie-place-tiles pass picks the physical column from flow adjacency to
198+
// placed core peers and respects per-shim DMA channel capacity. The col
199+
// and row int args record the OTHER side (compute side) of the flow
200+
// for airrt metadata; they have nothing to do with the shim's eventual
201+
// physical placement. (RFC #1567: subsumes the deletion of the
202+
// `colAllocConstraint == "same_column"` heuristic, formerly attempted
203+
// standalone in #1605 — that PR couldn't compile multi-column workloads
204+
// because shim tiles were still pre-pinned via createTileViaPlacer.)
215205
FailureOr<allocation_info_t>
216206
allocNewDmaChannel(air::MemcpyInterface &memcpyOp, int col, int row,
217-
std::vector<Operation *> &dma_ops,
218-
std::string colAllocConstraint = "same_column");
207+
std::vector<Operation *> &dma_ops);
219208

220209
FailureOr<allocation_info_t>
221210
allocNewDmaChannel(air::MemcpyInterface &memcpyOp,

mlir/lib/Conversion/AIRToAIEPass.cpp

Lines changed: 77 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -267,19 +267,12 @@ LogicalResult outlineAIECores(OpBuilder &builder, AIE::DeviceOp aie_device,
267267
// Emit aie.logical_tile<CoreTile>(phys_x, phys_y) and resolve via
268268
// mlir-aie's SequentialPlacer (RFC #1567 Stage A milestone 4). For
269269
// this milestone we keep both coordinates fully constrained, so the
270-
// placer is a pass-through and physical placement is identical to
271-
// before. Future milestones can relax to (col, ?) or (?, ?) for
272-
// herds whose communication patterns don't require strict adjacency.
273-
//
274-
// TODO(rfc-1567): Once constraints are relaxed, switch to a single
275-
// air::createTilesViaPlacer call up-front so the placer sees all
276-
// unconstrained tiles together. With fully-constrained hints the
277-
// per-tile invocation here is deterministic and preserves IR order.
278-
auto tileRes = air::createTileViaPlacer(
279-
aie_device, AIE::AIETileType::CoreTile, phys_x, phys_y);
280-
if (failed(tileRes))
281-
return failure();
282-
auto tile = *tileRes;
270+
// Compute tiles here are fully constrained to (phys_x, phys_y) by the
271+
// AIR herd; we can resolve directly to a physical aie.tile without any
272+
// placer involvement. (Memtiles and shim tiles take the LTO route — see
273+
// outlineAIEMemtiles and ShimDMAAllocator::allocNewDmaChannel — and let
274+
// the downstream `aie-place-tiles` pass pick rows/columns.)
275+
auto tile = air::getPhysTileOp(aie_device, phys_x, phys_y);
283276

284277
Operation *t = tile.getOperation();
285278
while (isa_and_present<AIE::TileLike>(t->getNextNode()))
@@ -827,16 +820,14 @@ LogicalResult outlineAIEMemtiles(OpBuilder &builder, AIE::DeviceOp aie_device,
827820
// use the command line offsets unless the attribute is present
828821
int64_t col_offset = options.col_offset;
829822

830-
// Emit each memtile as an unplaced aie.logical_tile<MemTile>(col, ?). The
831-
// column is constrained because the segment owns that column; the row is
832-
// left to mlir-aie's SequentialPlacer to determine. This removes the
833-
// hardcoded `phys_y = 1` and is the first step of the migration to logical
834-
// tiles (see RFC #1567).
823+
// Emit each memtile as an unplaced aie.logical_tile<MemTile>(col, ?) and
824+
// leave it logical. The downstream `aie-place-tiles` pass picks the row
825+
// (and may merge multiple LTOs onto one physical memtile when DMA capacity
826+
// permits). The column is constrained because the segment owns that column.
835827
//
836828
// Skip columns that have no memtile in this device (e.g., out-of-range
837-
// columns due to a too-large segment x_size + col_offset). Previously
838-
// getPhysTileOp would silently fabricate an invalid aie.tile; the placer is
839-
// strict so we filter here.
829+
// columns due to a too-large segment x_size + col_offset). The placer is
830+
// strict on out-of-range hints, so we filter here.
840831
const auto &targetModel = aie_device.getTargetModel();
841832
auto colHasMemTile = [&](int col) {
842833
if (col < 0 || col >= targetModel.columns())
@@ -846,24 +837,25 @@ LogicalResult outlineAIEMemtiles(OpBuilder &builder, AIE::DeviceOp aie_device,
846837
return true;
847838
return false;
848839
};
849-
SmallVector<std::pair<std::optional<int>, std::optional<int>>> hints;
840+
841+
SmallVector<AIE::LogicalTileOp> logicalMemTiles;
842+
auto *ctx = builder.getContext();
850843
for (auto x = 0; x < seg_size_x; x++) {
851844
auto phys_x = x + col_offset;
852845
if (!colHasMemTile(phys_x))
853846
continue;
854-
hints.push_back({phys_x, std::nullopt});
847+
auto colAttr = IntegerAttr::get(IntegerType::get(ctx, 32), phys_x);
848+
logicalMemTiles.push_back(AIE::LogicalTileOp::create(
849+
builder, aie_device.getLoc(), AIE::AIETileType::MemTile, colAttr,
850+
/*row=*/IntegerAttr(),
851+
/*allocation_scheme=*/StringAttr()));
855852
}
856853

857-
SmallVector<AIE::TileOp> placedMemTiles;
858-
if (failed(air::createTilesViaPlacer(aie_device, AIE::AIETileType::MemTile,
859-
hints, placedMemTiles)))
860-
return failure();
861-
862-
// Anchor each placed memtile with a tiny L2 buffer so it isn't folded away
863-
// before L2 allocation runs.
854+
// Anchor each emitted memtile with a tiny L2 buffer so it isn't folded
855+
// away before L2 allocation runs.
864856
auto memrefTy = MemRefType::get(SmallVector<int64_t>{1}, builder.getI8Type());
865857
static uint64_t BufferId = 0;
866-
for (auto tile : placedMemTiles) {
858+
for (auto tile : logicalMemTiles) {
867859
allocateBufferOp(BufferId, memrefTy, tile,
868860
builder.getStringAttr("__L2_tmp"));
869861
}
@@ -4032,7 +4024,15 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
40324024
}
40334025

40344026
for (auto &t : allocs) {
4035-
AIE::TileOp tileOp = cast<AIE::TileOp>(t.getDmaTile().getOperation());
4027+
// Shim DMA tiles are emitted as logical tiles by ShimDMAAllocator and
4028+
// resolved to physical TileOps by mlir-aie's `aie-place-tiles` pass,
4029+
// which runs (in aircc) BEFORE this metadata is consumed. At AIR-to-AIE
4030+
// time the col is therefore not yet known; write tryGetCol() and
4031+
// accept -1 when unplaced. The downstream metadata-fixup pass (run
4032+
// after aie-place-tiles) patches the "location" field for entries
4033+
// whose shim tile got a physical column from the placer.
4034+
AIE::TileLike tileLike = t.getDmaTile();
4035+
int64_t shimCol = tileLike ? tileLike.tryGetCol().value_or(-1) : -1;
40364036
int64_t col = t.col - col_offset;
40374037
int64_t row = t.row - row_offset;
40384038
int64_t chan = dir == AIE::DMAChannelDir::MM2S ? t.dma_channel.channel + 2
@@ -4053,9 +4053,8 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
40534053
builder.getI64IntegerAttr(col)));
40544054
attrs.push_back(NamedAttribute(StringAttr::get(ctx, "channel"),
40554055
builder.getI64IntegerAttr(chan)));
4056-
attrs.push_back(
4057-
NamedAttribute(StringAttr::get(ctx, "location"),
4058-
builder.getI64IntegerAttr(tileOp.getCol())));
4056+
attrs.push_back(NamedAttribute(StringAttr::get(ctx, "location"),
4057+
builder.getI64IntegerAttr(shimCol)));
40594058
push_back_if_unique<Attribute>(dma_allocations,
40604059
DictionaryAttr::get(ctx, attrs));
40614060
}
@@ -4199,21 +4198,23 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
41994198
}
42004199

42014200
// Annotate AIR DMA ops that correspond to a SHIM DMA allocation with packet
4202-
// information, specifically for MM2S (host-to-AIE) directions.
4201+
// information, specifically for MM2S (host-to-AIE) directions. The tile
4202+
// operand is passed as a Value so it works for both physical aie.tile and
4203+
// unplaced aie.logical_tile.
42034204
LogicalResult labelMemcpyOpsWithPacketFlow(air::MemcpyInterface memcpyOpIf,
42044205
StringAttr dmaNameAttr,
4205-
AIE::TileOp tileOp, int channel,
4206+
mlir::Value tileVal, int channel,
42064207
int packetFlowId = -1) {
42074208
// When a packet flow ID is available (from flow creation phase), use
42084209
// exact flow ID matching to disambiguate multiple flows sharing the
42094210
// same shim DMA channel. Otherwise fall back to source-only lookup.
42104211
AIE::PacketFlowOp pktFlowOp;
42114212
if (packetFlowId >= 0)
4212-
pktFlowOp = findPacketFlowOp(tileOp, AIE::WireBundle::DMA, channel,
4213+
pktFlowOp = findPacketFlowOp(tileVal, AIE::WireBundle::DMA, channel,
42134214
/*checkFlowID=*/true, packetFlowId);
42144215
if (!pktFlowOp)
42154216
pktFlowOp = getExistingPacketFlowOpFromRuntime(
4216-
tileOp, AIE::WireBundle::DMA, channel);
4217+
tileVal, AIE::WireBundle::DMA, channel);
42174218
if (!pktFlowOp)
42184219
return success();
42194220

@@ -4488,8 +4489,8 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
44884489
if (dir == AIE::DMAChannelDir::MM2S)
44894490
if (failed(labelMemcpyOpsWithPacketFlow(
44904491
memcpyIfOp, shim_name_attr,
4491-
cast<AIE::TileOp>(t.getDmaTile().getOperation()),
4492-
t.dma_channel.channel, t.packet_flow_id)))
4492+
t.getDmaTile()->getResult(0), t.dma_channel.channel,
4493+
t.packet_flow_id)))
44934494
return failure();
44944495
}
44954496

@@ -4937,7 +4938,7 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
49374938
std::vector<Operation *>>
49384939
dma_memcpys,
49394940
dmaAllocatorTy dmaAlloc, mlir::Location loc, memOpTy mem,
4940-
AIE::TileOp tile, bool lockRaceConditionFix = false) {
4941+
AIE::TileLike tile, bool lockRaceConditionFix = false) {
49414942

49424943
llvm::MapVector<std::pair<AIE::DMAChannelDir, int>,
49434944
std::vector<Operation *>>
@@ -5029,7 +5030,20 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
50295030
next_bd->insertBefore(end_bb);
50305031
AIE::NextBDOp::create(b, loc, next_bd);
50315032
}
5032-
auto bufferOp = dmaAlloc.getBuffer(BufferId, tile, memcpyOp);
5033+
// ShimDMA/MemTileDMA/TileDMA getBuffer subclass APIs still take
5034+
// AIE::TileOp; the tile parameter is unused by Shim/MemTile (which
5035+
// derive the buffer from the memcpy op) and used only as the owner
5036+
// tile by TileDMAAllocator. For TileDMA, `tile` here is always
5037+
// physical (compute tiles use getPhysTileOp), so cast<TileOp> is
5038+
// safe. Shim/MemTile may pass an LTO; the cast is unsafe in that
5039+
// case but the body never dereferences the tile value, so the
5040+
// cast<>'s null cast (to nullptr_t) does not blow up.
5041+
auto bufferOp = dmaAlloc.getBuffer(
5042+
BufferId,
5043+
dyn_cast<AIE::TileOp>(tile.getOperation()) ? cast<AIE::TileOp>(
5044+
tile.getOperation())
5045+
: nullptr,
5046+
memcpyOp);
50335047
if (failed(bufferOp)) {
50345048
memcpyOp->emitOpError("failed to get buffer.");
50355049
return failure();
@@ -5077,7 +5091,7 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
50775091
template <typename bufferOpTy>
50785092
FailureOr<AIE::DMABDOp>
50795093
generateDmaBd(mlir::Location loc, AIE::DMAChannelDir dir,
5080-
std::pair<AIE::LockOp, AIE::LockOp> locks, AIE::TileOp tile,
5094+
std::pair<AIE::LockOp, AIE::LockOp> locks, AIE::TileLike tile,
50815095
const AIE::AIETargetModel &targetModel, Block *bd,
50825096
air::MemcpyInterface memcpyOp, bufferOpTy bufferOp, int chan) {
50835097
bool UsesSemaphoreLocks =
@@ -5143,7 +5157,7 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
51435157

51445158
// Packet flow routing: get packet flow id.
51455159
auto pktFlowOp = getExistingPacketFlowOpFromDevice(
5146-
tile, AIE::WireBundle::DMA, chan, memcpyOp);
5160+
tile->getResult(0), AIE::WireBundle::DMA, chan, memcpyOp);
51475161
AIE::PacketInfoAttr pktInfoAttr = nullptr;
51485162
if (isMM2S && pktFlowOp) {
51495163
auto packetID = pktFlowOp.getID();
@@ -5515,16 +5529,16 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
55155529
return failure();
55165530
}
55175531

5518-
AIE::ShimDMAOp getShimDMAOp(AIE::TileOp tile) {
5519-
auto users = tile.getResult().getUsers();
5532+
AIE::ShimDMAOp getShimDMAOp(AIE::TileLike tile) {
5533+
auto users = tile->getResult(0).getUsers();
55205534
for (auto user : users)
55215535
if (auto shimDMAOp = dyn_cast_if_present<AIE::ShimDMAOp>(*user))
55225536
return shimDMAOp;
55235537
return nullptr;
55245538
}
55255539

5526-
AIE::MemTileDMAOp getMemTileDMAOp(AIE::TileOp tile) {
5527-
auto users = tile.getResult().getUsers();
5540+
AIE::MemTileDMAOp getMemTileDMAOp(AIE::TileLike tile) {
5541+
auto users = tile->getResult(0).getUsers();
55285542
for (auto user : users)
55295543
if (auto memTileDMAOp = dyn_cast_if_present<AIE::MemTileDMAOp>(*user))
55305544
return memTileDMAOp;
@@ -6019,14 +6033,16 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
60196033

60206034
// Generate L3 DMA program
60216035

6022-
// Gather all shim tiles and memtiles used in design
6023-
std::vector<AIE::TileOp> shimtiles;
6024-
std::vector<AIE::TileOp> memTileTiles;
6036+
// Gather all shim tiles and memtiles used in design. Both physical
6037+
// (AIE::TileOp) and unplaced (AIE::LogicalTileOp) entries flow through
6038+
// here uniformly via TileLike; the downstream aie.shim_dma /
6039+
// aie.memtile_dma ops accept any Index-typed tile operand.
6040+
std::vector<AIE::TileLike> shimtiles;
6041+
std::vector<AIE::TileLike> memTileTiles;
60256042
for (auto &alloc : shimDmaAlloc.mm2s_allocs) {
60266043
auto tile = alloc.getDmaTile();
60276044
if (tile.isShimTile())
6028-
push_back_if_unique<AIE::TileOp>(
6029-
shimtiles, cast<AIE::TileOp>(tile.getOperation()));
6045+
push_back_if_unique<AIE::TileLike>(shimtiles, tile);
60306046
else {
60316047
tile->emitOpError(
60326048
"tile is logged for shim DMA allocation, but is not shim tile.");
@@ -6036,8 +6052,7 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
60366052
for (auto &alloc : memTileDmaAlloc.mm2s_allocs) {
60376053
auto tile = alloc.getDmaTile();
60386054
if (tile.isMemTile())
6039-
push_back_if_unique<AIE::TileOp>(
6040-
memTileTiles, cast<AIE::TileOp>(tile.getOperation()));
6055+
push_back_if_unique<AIE::TileLike>(memTileTiles, tile);
60416056
else {
60426057
tile->emitOpError(
60436058
"tile is logged for memtile DMA allocation, but is not memtile.");
@@ -6079,7 +6094,8 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
60796094
if (!shimDMA) {
60806095
rewriter.setInsertionPoint(device.getBody()->getTerminator());
60816096
shimDMA = AIE::ShimDMAOp::create(rewriter, rewriter.getUnknownLoc(),
6082-
rewriter.getIndexType(), tile);
6097+
rewriter.getIndexType(),
6098+
tile->getResult(0));
60836099
}
60846100

60856101
auto loc = rewriter.getUnknownLoc();
@@ -6126,8 +6142,10 @@ class AIRToAIEPass : public air::impl::AIRToAIEBase<AIRToAIEPass> {
61266142
AIE::MemTileDMAOp memTileDMA = getMemTileDMAOp(tile);
61276143
if (!memTileDMA) {
61286144
rewriter.setInsertionPoint(device.getBody()->getTerminator());
6129-
memTileDMA = AIE::MemTileDMAOp::create(
6130-
rewriter, rewriter.getUnknownLoc(), rewriter.getIndexType(), tile);
6145+
memTileDMA = AIE::MemTileDMAOp::create(rewriter,
6146+
rewriter.getUnknownLoc(),
6147+
rewriter.getIndexType(),
6148+
tile->getResult(0));
61316149
}
61326150

61336151
auto loc = rewriter.getUnknownLoc();

0 commit comments

Comments
 (0)