Align buffer allocate#3037
Conversation
…cture before adding align-alloc feature
…th bank_id only in bank-aware mode
There was a problem hiding this comment.
Pull request overview
This PR reworks AIE buffer address assignment to support default alignment of buffer allocations to the tile load/store bus width, while also refactoring/adjusting the bank-aware vs basic-sequential allocator behavior and updating tests to match.
Changes:
- Add an
alignedattribute toaie.buffer(defaulting to true) and align allocated buffer start addresses to the tile’s load/store bus width. - Extend
AIETargetModelwith APIs to query compute-tile vs mem-tile load/store bus widths and use these in allocation/alignment checks. - Update/expand MLIR tests for aligned vs unaligned pre-allocated addresses and for fallback/error behavior.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/Dialect/AIE/Transforms/AIEAssignBuffers.cpp | Implements alignment-aware allocation logic and additional allocation checks. |
| include/aie/Dialect/AIE/IR/AIEOps.td | Adds aligned BoolAttr (default true) to aie.buffer. |
| include/aie/Dialect/AIE/IR/AIETargetModel.h | Adds bus-width query APIs used to determine alignment requirements. |
| lib/Dialect/AIEX/Transforms/AIECreateCores.cpp | Updates BufferOp builder call to include the new aligned parameter. |
| lib/Dialect/AIE/Transforms/AIEObjectFifoStatefulTransform.cpp | Updates BufferOp builder calls to include the new aligned parameter. |
| test/assign-buffer-addresses/basic_alloc_simple.mlir | Adds split-input-file coverage and new alignment test modules/expectations. |
| test/assign-buffer-addresses/basic_alloc_memtile_simple.mlir | Adds split-input-file coverage and memtile alignment expectations. |
| test/assign-buffer-addresses/basic_alloc_memtile_error.mlir | Adds coverage for failing unaligned pre-allocated addresses under default alignment. |
| test/assign-buffer-addresses/basic_prealloc.mlir | Updates checks to account for aligned=false when using intentionally unaligned addresses. |
| test/assign-buffer-addresses/bank_aware_alloc_simple.mlir | Adds split-input-file coverage and bank-aware alignment expectations. |
| test/assign-buffer-addresses/fallback_alloc_error.mlir | Expands fallback scenarios and validates expected diagnostics with aligned=false. |
| test/assign-buffer-addresses/per_tile_alloc_respects_buffer_alloc.mlir | Adjusts expected addresses to account for stack size changes. |
| test/assign-buffer-addresses/basic_alloc_error.mlir | Minor file formatting change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // First, allocate the buffer with pre-allocated address | ||
| // Then, allocated the buffer with pre-allocated mem_bank t | ||
| // Do it by doing a sort preAllocatedBuffers to have buffer with address first, then buffer with only mem_bank | ||
| std::sort(preAllocatedBuffers.begin(), preAllocatedBuffers.end(), | ||
| [](BufferOp a, BufferOp b) -> bool { | ||
| auto a_addr = a.getAddress(); | ||
| auto b_addr = b.getAddress(); | ||
| if(a_addr.has_value() && !b_addr.has_value()){ | ||
| return true;// a buffer before b_buffer | ||
| }else if(!a_addr.has_value() && b_addr.has_value()){ | ||
| return false;// b buffer before a_buffer | ||
| }else{ | ||
| return false; | ||
| } | ||
| } | ||
| // Shouldn't reach here since isBufferPreAllocated was true. | ||
| ); |
There was a problem hiding this comment.
The preAllocatedBuffers sort comparator only distinguishes “has address” vs “no address” and otherwise returns false, so the relative order of address-specified buffers is effectively arbitrary/IR-order dependent. Since checkAndAddBufferWithAddress uses nextAddrInBanks as a monotonic cursor, processing a higher-address buffer before a lower-address one can incorrectly trigger "would override allocated address" even when the allocations don’t overlap. Consider sorting address-specified buffers by (mem_bank, address) (or by inferred bank + address) before calling checkAndAddBufferWithAddress, or changing the bookkeeping to be order-independent.
| @@ -226,6 +300,10 @@ checkAndAddBufferWithMemBank(BufferOp buffer, int numBanks, | |||
|
|
|||
| int mem_bank = memBankAttr.getInt(); | |||
| int64_t startAddr = nextAddrInBanks[mem_bank]; | |||
| if(buffer.getAligned()){ | |||
| startAddr = getAlignedAddress(startAddr, tileAlignBitWidth); | |||
| } | |||
There was a problem hiding this comment.
checkAndAddBufferWithMemBank indexes nextAddrInBanks[mem_bank] without validating that mem_bank is in-range [0, numBanks). An invalid mem_bank attribute can cause out-of-bounds access. Add a bounds check and emit a clear error if mem_bank is invalid.
There was a problem hiding this comment.
A bounds check seems reasonable :)
| bool isBufferForTile(BufferOp buffer, TileOp tile) { | ||
| return buffer.getTile() == tile; | ||
| } | ||
|
|
||
| bool isBufferPreAllocated(BufferOp buffer ){ | ||
| auto addr = buffer.getAddress(); | ||
| auto memBank = buffer.getMemBank(); | ||
| return (addr != std::nullopt || memBank != std::nullopt); | ||
|
|
||
| } |
There was a problem hiding this comment.
These helper functions have external linkage in this .cpp (not in an anonymous namespace / not static). Also isBufferForTile is currently unused. Consider removing the unused helper and giving the remaining helpers internal linkage (anonymous namespace or static) to avoid accidental symbol export/ODR issues.
There was a problem hiding this comment.
@joeldushouyu This sounds like it might be worth addressing? Can you take a quick lock?
| // Return an address that is aligned to tile's load/store bus | ||
| // NOTE: assume address are byte address | ||
| uint32_t getAlignedAddress( uint32_t address, uint32_t alignBitWidth){ | ||
| assert(alignBitWidth != 0 && alignBitWidth % 8 == 0 && "alignBitWidth must be a non-zero multiple of 8"); | ||
| uint32_t alignByteWidth = alignBitWidth / 8; | ||
| if (address % alignByteWidth == 0) { | ||
| return address; | ||
| } else { | ||
| return ((address / alignByteWidth) + 1) * alignByteWidth; | ||
| } |
There was a problem hiding this comment.
getAlignedAddress takes/returns uint32_t, but callers frequently pass int64_t addresses (e.g., startAddr in bank-aware allocation). This causes implicit narrowing/truncation and can wrap if address values ever exceed 32-bit. Consider changing this helper to use int64_t/uint64_t consistently and avoiding mixed signed/unsigned arithmetic.
| if (buffer.getAddress()){ | ||
| allocated_buffers.push_back(buffer); //TODO: Right now, this ignore all buffer with only mem_bank attribute. | ||
| } | ||
| else{ | ||
| buffers.push_back(buffer); | ||
| } |
There was a problem hiding this comment.
In basicAllocation, buffers that only specify mem_bank are treated as if they were not pre-allocated (only address is considered). This can leave a user-specified mem_bank attribute on the op while assigning an address that doesn’t correspond to that bank. Either honor mem_bank in the basic allocator, or reject/clear mem_bank when alloc-scheme=basic-sequential to avoid producing inconsistent IR.
…nx#3028) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Thank you for putting this together! There are a few other AI review suggestions that warrant consideration, but otherwise looks good to me! Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
hunhoffe
left a comment
There was a problem hiding this comment.
There are a few AI review comments that warrant review, but otherwise looks good to me! Let's merge it once the CI is happy. Thanks for your contribution!
| bool isBufferForTile(BufferOp buffer, TileOp tile) { | ||
| return buffer.getTile() == tile; | ||
| } | ||
|
|
||
| bool isBufferPreAllocated(BufferOp buffer ){ | ||
| auto addr = buffer.getAddress(); | ||
| auto memBank = buffer.getMemBank(); | ||
| return (addr != std::nullopt || memBank != std::nullopt); | ||
|
|
||
| } |
There was a problem hiding this comment.
@joeldushouyu This sounds like it might be worth addressing? Can you take a quick lock?
| // Return an address that is aligned to tile's load/store bus | ||
| // NOTE: assume address are byte address | ||
| uint32_t getAlignedAddress( uint32_t address, uint32_t alignBitWidth){ | ||
| assert(alignBitWidth != 0 && alignBitWidth % 8 == 0 && "alignBitWidth must be a non-zero multiple of 8"); | ||
| uint32_t alignByteWidth = alignBitWidth / 8; | ||
| if (address % alignByteWidth == 0) { | ||
| return address; | ||
| } else { | ||
| return ((address / alignByteWidth) + 1) * alignByteWidth; | ||
| } |
| void setAndUpdateAddressInBank(BufferOp buffer, int64_t start_addr, | ||
| int64_t end_addr, | ||
| std::vector<int64_t> &nextAddrInBanks) { | ||
| // Fixme: alignment |
There was a problem hiding this comment.
woohoo! Nice to see a fixme addressed!
| @@ -226,6 +300,10 @@ checkAndAddBufferWithMemBank(BufferOp buffer, int numBanks, | |||
|
|
|||
| int mem_bank = memBankAttr.getInt(); | |||
| int64_t startAddr = nextAddrInBanks[mem_bank]; | |||
| if(buffer.getAligned()){ | |||
| startAddr = getAlignedAddress(startAddr, tileAlignBitWidth); | |||
| } | |||
There was a problem hiding this comment.
A bounds check seems reasonable :)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| while (current_alloc != allocated_buffers.end() && | ||
| address + buffer.getAllocationSize() > | ||
| current_alloc->getAddress().value()) { | ||
| address = current_alloc->getAddress().value() + | ||
| current_alloc->getAllocationSize(); |
| // First, allocate the buffer with pre-allocated address | ||
| // Then, allocated the buffer with pre-allocated mem_bank t | ||
| // Do it by doing a sort preAllocatedBuffers to have buffer with address | ||
| // first, then buffer with only mem_bank | ||
| std::sort(preAllocatedBuffers.begin(), preAllocatedBuffers.end(), | ||
| [](BufferOp a, BufferOp b) -> bool { | ||
| auto a_addr = a.getAddress(); | ||
| auto b_addr = b.getAddress(); | ||
| if (a_addr.has_value() && !b_addr.has_value()) { | ||
| return true; // a buffer before b_buffer | ||
| } else if (!a_addr.has_value() && b_addr.has_value()) { | ||
| return false; // b buffer before a_buffer | ||
| } else { | ||
| return false; | ||
| } |
| uint32_t getLocalMemorySize() const override { return 0x00010000; } | ||
| uint32_t getAccumulatorCascadeSize() const override { return 512; } | ||
| uint32_t getComputeTileLoadStoreBusWidth() const override { return 256; } | ||
| uint32_t getMemTileLoadStoreBusWidth() const override { return 256; } |
There was a problem hiding this comment.
Not sure about this. From AMD 027

It seems the memory bank is native 256-bit width for npu2's memtile, which means the buffer should naturally be padded to 256 bits for alignment allocation. However, from my experience, I notice the buffers on Memtile work fine as long it is allocated and used in a 4-byte alignment manner. In other words, I suspect the DMA already handles the align/unalign access, which is why I left this comment
Any ideas or suggestions? @hunhoffe
There was a problem hiding this comment.
@joeldushouyu -- sorry for the delay. I think vector alignment (16B) is always sufficient, so I'd trust your experience here. However, it is good to document this uncertainty in a comment. Sorry it took me a little to get back to you, I needed to look into this one!
There was a problem hiding this comment.
Just to clarify, should I set the buffers to 4-byte alignment on the MT, or keep everything 16-byte aligned on the MT for both npu1 and npu2?
There was a problem hiding this comment.
I believe 4-byte alignment is sufficient, which -- if I read your commend correctly -- aligns with your experience as well?
| return (checkAndPrintOverlapStackframe(stacksize, allBuffers_on_tile) && | ||
| checkAndPrintOverflow(tile, address, maxDataMemorySize, stacksize, | ||
| allBuffers_on_tile)); |
…ated with address are sorted in correct ascending order
…trying to find space for buffer allocation
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| uint32_t getLocalMemorySize() const override { return 0x00010000; } | ||
| uint32_t getAccumulatorCascadeSize() const override { return 512; } | ||
| uint32_t getComputeTileLoadStoreBusWidth() const override { return 256; } | ||
| uint32_t getMemTileLoadStoreBusWidth() const override { return 256; } |
| OptionalAttr<ElementsAttr>:$initial_value, | ||
| OptionalAttr<AIEI32Attr>:$mem_bank | ||
| OptionalAttr<AIEI32Attr>:$mem_bank, | ||
| DefaultValuedAttr<BoolAttr, "true">:$aligned |
| uint32_t getMemEastBaseAddress() const override { return 0x00070000; } | ||
| uint32_t getLocalMemorySize() const override { return 0x00010000; } | ||
| uint32_t getAccumulatorCascadeSize() const override { return 512; } | ||
| uint32_t getComputeTileLoadStoreBusWidth() const override { return 256; } |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Address issue 3019 and issue 2579
Commit up to d3684e2 is simply a code refactor to ensure bank-aware and basic-sequential allocation have similar code organization.
Commit 0c32449 addresses a bug in bank-aware allocation from pr #3003.
Commit b08a9df. Introduce the alignment allocation on all buffers, which is set to true by default.
Note: Someone need to double check my understand here