Skip to content

Align buffer allocate#3037

Merged
hunhoffe merged 25 commits into
Xilinx:mainfrom
joeldushouyu:align-buffer-allocate
May 9, 2026
Merged

Align buffer allocate#3037
hunhoffe merged 25 commits into
Xilinx:mainfrom
joeldushouyu:align-buffer-allocate

Conversation

@joeldushouyu
Copy link
Copy Markdown
Contributor

Address issue 3019 and issue 2579

Commit up to d3684e2 is simply a code refactor to ensure bank-aware and basic-sequential allocation have similar code organization.
Commit 0c32449 addresses a bug in bank-aware allocation from pr #3003.
Commit b08a9df. Introduce the alignment allocation on all buffers, which is set to true by default.

Note: Someone need to double check my understand here

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reworks AIE buffer address assignment to support default alignment of buffer allocations to the tile load/store bus width, while also refactoring/adjusting the bank-aware vs basic-sequential allocator behavior and updating tests to match.

Changes:

  • Add an aligned attribute to aie.buffer (defaulting to true) and align allocated buffer start addresses to the tile’s load/store bus width.
  • Extend AIETargetModel with APIs to query compute-tile vs mem-tile load/store bus widths and use these in allocation/alignment checks.
  • Update/expand MLIR tests for aligned vs unaligned pre-allocated addresses and for fallback/error behavior.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
lib/Dialect/AIE/Transforms/AIEAssignBuffers.cpp Implements alignment-aware allocation logic and additional allocation checks.
include/aie/Dialect/AIE/IR/AIEOps.td Adds aligned BoolAttr (default true) to aie.buffer.
include/aie/Dialect/AIE/IR/AIETargetModel.h Adds bus-width query APIs used to determine alignment requirements.
lib/Dialect/AIEX/Transforms/AIECreateCores.cpp Updates BufferOp builder call to include the new aligned parameter.
lib/Dialect/AIE/Transforms/AIEObjectFifoStatefulTransform.cpp Updates BufferOp builder calls to include the new aligned parameter.
test/assign-buffer-addresses/basic_alloc_simple.mlir Adds split-input-file coverage and new alignment test modules/expectations.
test/assign-buffer-addresses/basic_alloc_memtile_simple.mlir Adds split-input-file coverage and memtile alignment expectations.
test/assign-buffer-addresses/basic_alloc_memtile_error.mlir Adds coverage for failing unaligned pre-allocated addresses under default alignment.
test/assign-buffer-addresses/basic_prealloc.mlir Updates checks to account for aligned=false when using intentionally unaligned addresses.
test/assign-buffer-addresses/bank_aware_alloc_simple.mlir Adds split-input-file coverage and bank-aware alignment expectations.
test/assign-buffer-addresses/fallback_alloc_error.mlir Expands fallback scenarios and validates expected diagnostics with aligned=false.
test/assign-buffer-addresses/per_tile_alloc_respects_buffer_alloc.mlir Adjusts expected addresses to account for stack size changes.
test/assign-buffer-addresses/basic_alloc_error.mlir Minor file formatting change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/Dialect/AIE/Transforms/AIEAssignBuffers.cpp Outdated
Comment on lines +517 to +532
// First, allocate the buffer with pre-allocated address
// Then, allocated the buffer with pre-allocated mem_bank t
// Do it by doing a sort preAllocatedBuffers to have buffer with address first, then buffer with only mem_bank
std::sort(preAllocatedBuffers.begin(), preAllocatedBuffers.end(),
[](BufferOp a, BufferOp b) -> bool {
auto a_addr = a.getAddress();
auto b_addr = b.getAddress();
if(a_addr.has_value() && !b_addr.has_value()){
return true;// a buffer before b_buffer
}else if(!a_addr.has_value() && b_addr.has_value()){
return false;// b buffer before a_buffer
}else{
return false;
}
}
// Shouldn't reach here since isBufferPreAllocated was true.
);
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preAllocatedBuffers sort comparator only distinguishes “has address” vs “no address” and otherwise returns false, so the relative order of address-specified buffers is effectively arbitrary/IR-order dependent. Since checkAndAddBufferWithAddress uses nextAddrInBanks as a monotonic cursor, processing a higher-address buffer before a lower-address one can incorrectly trigger "would override allocated address" even when the allocations don’t overlap. Consider sorting address-specified buffers by (mem_bank, address) (or by inferred bank + address) before calling checkAndAddBufferWithAddress, or changing the bookkeeping to be order-independent.

Copilot uses AI. Check for mistakes.
Comment on lines 297 to +305
@@ -226,6 +300,10 @@ checkAndAddBufferWithMemBank(BufferOp buffer, int numBanks,

int mem_bank = memBankAttr.getInt();
int64_t startAddr = nextAddrInBanks[mem_bank];
if(buffer.getAligned()){
startAddr = getAlignedAddress(startAddr, tileAlignBitWidth);
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkAndAddBufferWithMemBank indexes nextAddrInBanks[mem_bank] without validating that mem_bank is in-range [0, numBanks). An invalid mem_bank attribute can cause out-of-bounds access. Add a bounds check and emit a clear error if mem_bank is invalid.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bounds check seems reasonable :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread include/aie/Dialect/AIE/IR/AIETargetModel.h Outdated
Comment on lines +30 to +39
bool isBufferForTile(BufferOp buffer, TileOp tile) {
return buffer.getTile() == tile;
}

bool isBufferPreAllocated(BufferOp buffer ){
auto addr = buffer.getAddress();
auto memBank = buffer.getMemBank();
return (addr != std::nullopt || memBank != std::nullopt);

}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helper functions have external linkage in this .cpp (not in an anonymous namespace / not static). Also isBufferForTile is currently unused. Consider removing the unused helper and giving the remaining helpers internal linkage (anonymous namespace or static) to avoid accidental symbol export/ODR issues.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeldushouyu This sounds like it might be worth addressing? Can you take a quick lock?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +42 to +51
// Return an address that is aligned to tile's load/store bus
// NOTE: assume address are byte address
uint32_t getAlignedAddress( uint32_t address, uint32_t alignBitWidth){
assert(alignBitWidth != 0 && alignBitWidth % 8 == 0 && "alignBitWidth must be a non-zero multiple of 8");
uint32_t alignByteWidth = alignBitWidth / 8;
if (address % alignByteWidth == 0) {
return address;
} else {
return ((address / alignByteWidth) + 1) * alignByteWidth;
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAlignedAddress takes/returns uint32_t, but callers frequently pass int64_t addresses (e.g., startAddr in bank-aware allocation). This causes implicit narrowing/truncation and can wrap if address values ever exceed 32-bit. Consider changing this helper to use int64_t/uint64_t consistently and avoiding mixed signed/unsigned arithmetic.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeldushouyu Can you double check this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread lib/Dialect/AIE/Transforms/AIEAssignBuffers.cpp Outdated
Comment on lines +129 to +134
if (buffer.getAddress()){
allocated_buffers.push_back(buffer); //TODO: Right now, this ignore all buffer with only mem_bank attribute.
}
else{
buffers.push_back(buffer);
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In basicAllocation, buffers that only specify mem_bank are treated as if they were not pre-allocated (only address is considered). This can leave a user-specified mem_bank attribute on the op while assigning an address that doesn’t correspond to that bank. Either honor mem_bank in the basic allocator, or reject/clear mem_bank when alloc-scheme=basic-sequential to avoid producing inconsistent IR.

Copilot uses AI. Check for mistakes.
@joeldushouyu joeldushouyu marked this pull request as draft April 24, 2026 20:48
@joeldushouyu joeldushouyu marked this pull request as draft April 24, 2026 20:48
@joeldushouyu joeldushouyu marked this pull request as ready for review April 24, 2026 20:56
joeldushouyu and others added 3 commits April 24, 2026 17:33
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Thank you for putting this together! There are a few other AI review suggestions that warrant consideration, but otherwise looks good to me!

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@hunhoffe hunhoffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few AI review comments that warrant review, but otherwise looks good to me! Let's merge it once the CI is happy. Thanks for your contribution!

Comment on lines +30 to +39
bool isBufferForTile(BufferOp buffer, TileOp tile) {
return buffer.getTile() == tile;
}

bool isBufferPreAllocated(BufferOp buffer ){
auto addr = buffer.getAddress();
auto memBank = buffer.getMemBank();
return (addr != std::nullopt || memBank != std::nullopt);

}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeldushouyu This sounds like it might be worth addressing? Can you take a quick lock?

Comment on lines +42 to +51
// Return an address that is aligned to tile's load/store bus
// NOTE: assume address are byte address
uint32_t getAlignedAddress( uint32_t address, uint32_t alignBitWidth){
assert(alignBitWidth != 0 && alignBitWidth % 8 == 0 && "alignBitWidth must be a non-zero multiple of 8");
uint32_t alignByteWidth = alignBitWidth / 8;
if (address % alignByteWidth == 0) {
return address;
} else {
return ((address / alignByteWidth) + 1) * alignByteWidth;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeldushouyu Can you double check this?

void setAndUpdateAddressInBank(BufferOp buffer, int64_t start_addr,
int64_t end_addr,
std::vector<int64_t> &nextAddrInBanks) {
// Fixme: alignment
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woohoo! Nice to see a fixme addressed!

Comment on lines 297 to +305
@@ -226,6 +300,10 @@ checkAndAddBufferWithMemBank(BufferOp buffer, int numBanks,

int mem_bank = memBankAttr.getInt();
int64_t startAddr = nextAddrInBanks[mem_bank];
if(buffer.getAligned()){
startAddr = getAlignedAddress(startAddr, tileAlignBitWidth);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bounds check seems reasonable :)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 165 to 169
while (current_alloc != allocated_buffers.end() &&
address + buffer.getAllocationSize() >
current_alloc->getAddress().value()) {
address = current_alloc->getAddress().value() +
current_alloc->getAllocationSize();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +511 to +525
// First, allocate the buffer with pre-allocated address
// Then, allocated the buffer with pre-allocated mem_bank t
// Do it by doing a sort preAllocatedBuffers to have buffer with address
// first, then buffer with only mem_bank
std::sort(preAllocatedBuffers.begin(), preAllocatedBuffers.end(),
[](BufferOp a, BufferOp b) -> bool {
auto a_addr = a.getAddress();
auto b_addr = b.getAddress();
if (a_addr.has_value() && !b_addr.has_value()) {
return true; // a buffer before b_buffer
} else if (!a_addr.has_value() && b_addr.has_value()) {
return false; // b buffer before a_buffer
} else {
return false;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint32_t getLocalMemorySize() const override { return 0x00010000; }
uint32_t getAccumulatorCascadeSize() const override { return 512; }
uint32_t getComputeTileLoadStoreBusWidth() const override { return 256; }
uint32_t getMemTileLoadStoreBusWidth() const override { return 256; }
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. From AMD 027
image

It seems the memory bank is native 256-bit width for npu2's memtile, which means the buffer should naturally be padded to 256 bits for alignment allocation. However, from my experience, I notice the buffers on Memtile work fine as long it is allocated and used in a 4-byte alignment manner. In other words, I suspect the DMA already handles the align/unalign access, which is why I left this comment
Any ideas or suggestions? @hunhoffe

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeldushouyu -- sorry for the delay. I think vector alignment (16B) is always sufficient, so I'd trust your experience here. However, it is good to document this uncertainty in a comment. Sorry it took me a little to get back to you, I needed to look into this one!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, should I set the buffers to 4-byte alignment on the MT, or keep everything 16-byte aligned on the MT for both npu1 and npu2?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 4-byte alignment is sufficient, which -- if I read your commend correctly -- aligns with your experience as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +192 to +194
return (checkAndPrintOverlapStackframe(stacksize, allBuffers_on_tile) &&
checkAndPrintOverflow(tile, address, maxDataMemorySize, stacksize,
allBuffers_on_tile));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

uint32_t getLocalMemorySize() const override { return 0x00010000; }
uint32_t getAccumulatorCascadeSize() const override { return 512; }
uint32_t getComputeTileLoadStoreBusWidth() const override { return 256; }
uint32_t getMemTileLoadStoreBusWidth() const override { return 256; }
Comment thread test/assign-buffer-addresses/basic_alloc_simple.mlir Outdated
OptionalAttr<ElementsAttr>:$initial_value,
OptionalAttr<AIEI32Attr>:$mem_bank
OptionalAttr<AIEI32Attr>:$mem_bank,
DefaultValuedAttr<BoolAttr, "true">:$aligned
uint32_t getMemEastBaseAddress() const override { return 0x00070000; }
uint32_t getLocalMemorySize() const override { return 0x00010000; }
uint32_t getAccumulatorCascadeSize() const override { return 512; }
uint32_t getComputeTileLoadStoreBusWidth() const override { return 256; }
joeldushouyu and others added 3 commits May 2, 2026 10:58
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@hunhoffe hunhoffe added this to the IRON 1.3.2 milestone May 6, 2026
@hunhoffe hunhoffe added this pull request to the merge queue May 9, 2026
Merged via the queue into Xilinx:main with commit fe9db53 May 9, 2026
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants