You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All five tests under test/npu-xrt/memtile_dmas/ reproducibly TDR on Phoenix
(NPU1) hardware running the latest mlir-aie HEAD. The same tests pass under
the in-emulator run path (xdna-emu), so the test logic itself is correct;
the hang is at the firmware/DMA-execution layer.
I'm filing this to ask whether these tests actually pass in upstream CI on
the amd7940hs runner today, since the upstream lit summary in CI logs
only enumerates failures/skips, not passes — so I can't tell from the
public log whether they ran successfully or were silently Unsupported.
test.exe blocks indefinitely in drm_syncobj_array_wait_timeout. The
runtime sequence completes (every aiex.npu.writebd and aiex.npu.write32
dispatches successfully), but the final aiex.npu.sync never receives its
TCT from shim S2MM. dmesg confirms the kernel context never makes
forward progress:
Removing the runtime-sequence sync lets the kernel return cleanly (test
fails on data check rather than TDR), confirming the hang is exclusively
at sync time waiting for shim S2MM completion.
What we ruled out empirically
Constructed a series of minimal MLIR variants to isolate the trigger.
All run on the same Phoenix HW.
Variant
Result
Single aiex.npu.write32 to memtile (col 0, row 1) LOCK0_VALUE
PASS
Single aiex.npu.writebd to memtile (program BD, no exec)
PASS
writebd + push to memtile S2MM TASK_QUEUE (channel start)
PASS
Full writebd test with locks and chains
TDR
Full writebd with use_next_bd=0 (no self-loop)
TDR
Full writebd with all lock_acq_enable=0 (no locks)
TDR
add_one_using_dma (static aie.memtile_dma block)
PASS
So:
Runtime-sequence writes to memtile registers work fine on Phoenix.
Multi-channel runtime-programmed memtile DMA flow (shim → memtile → shim)
never delivers data to the shim S2MM receiver.
The bug is independent of self-looping next_bd and independent of locks.
Environment
Component
Version
CPU
AMD Ryzen 9 7940HS (same family as the upstream amd7940hs runner)
NPU
Phoenix (NPU Phoenix, aie2, 6×5 topology)
NPU Firmware
1.5.5.391 (per xrt-smi examine)
XRT
2.23.0
amdxdna driver
2.23.0_20260509 (xdna-driver HEAD c347d62)
mlir-aie HEAD
b37dc33d41
llvm-aie / Peano
latest (compile path used: chess)
aietools
RyzenAI 2025.2 / Vitis AIE Essentials
The upstream CI Phoenix runner (amd7940hs) appears to use /opt/ryzen_ai-1.3.0.1/vitis_aie_essentials per the workflow logs.
We're a few minor versions ahead on RyzenAI.
Confidence the bug is HW/FW-specific
xdna-emu (in-process emulator) runs the same xclbin and runtime sequence
through its own DMA model and reports PASS!. So the lowering and
test logic are sound; only the on-silicon execution diverges.
add_one_using_dma, which exercises the same shim ↔ memtile ↔ shim flow
but programs the memtile DMA statically via the aie.memtile_dma
block (encoded into CDO at xclbin-load time), passes on the same
hardware. Only the runtime-sequence-programmed path TDRs.
xrt-smi validate passes; the device is healthy at SMI level.
Questions
Do these five tests currently pass on amd7940hs in upstream CI? The
visible log only lists Unsupported and Failed; if they ran in the Passed count, the count is the only evidence and it's not
discriminating.
If they pass in CI: what NPU firmware version is the runner using?
We suspect a regression between the firmware bundled with RyzenAI
1.3.0.1 (CI) and 1.5.x (us), since the only obvious environmental
delta is firmware.
If they fail (or are silently Unsupported) in CI too: would you
accept a PR marking these as XFAIL on ryzen_ai_npu1 with a
reference to this issue, until the underlying firmware/runtime
issue is resolved?
Background context on Phoenix firmware limitations we've documented
separately: xrt::hw_context::read_aie_reg returns successfully for
compute-tile reads but never responds for memtile reads on the same
firmware version. The driver kills the user-context mailbox on the
resulting 5s timeout, which then cascades to a drm_dev_unplug
wedge during modprobe -r. We can include details if that's
potentially related — it suggests memtile runtime access in
general is incompletely supported in Phoenix firmware 1.5.x.
Happy to provide the full lit/dmesg traces or any additional test
variations on request.
All five tests under
test/npu-xrt/memtile_dmas/reproducibly TDR on Phoenix(NPU1) hardware running the latest mlir-aie HEAD. The same tests pass under
the in-emulator run path (xdna-emu), so the test logic itself is correct;
the hang is at the firmware/DMA-execution layer.
I'm filing this to ask whether these tests actually pass in upstream CI on
the
amd7940hsrunner today, since the upstream lit summary in CI logsonly enumerates failures/skips, not passes — so I can't tell from the
public log whether they ran successfully or were silently
Unsupported.Affected tests
All five fail identically. The simplest standalone repro is
writebd.Reproducer
Run via native lit (no custom test infrastructure):
test.exeblocks indefinitely indrm_syncobj_array_wait_timeout. Theruntime sequence completes (every
aiex.npu.writebdandaiex.npu.write32dispatches successfully), but the final
aiex.npu.syncnever receives itsTCT from shim S2MM. dmesg confirms the kernel context never makes
forward progress:
Removing the runtime-sequence
synclets the kernel return cleanly (testfails on data check rather than TDR), confirming the hang is exclusively
at sync time waiting for shim S2MM completion.
What we ruled out empirically
Constructed a series of minimal MLIR variants to isolate the trigger.
All run on the same Phoenix HW.
aiex.npu.write32to memtile (col 0, row 1) LOCK0_VALUEaiex.npu.writebdto memtile (program BD, no exec)writebd+ push to memtile S2MM TASK_QUEUE (channel start)writebdtest with locks and chainswritebdwithuse_next_bd=0(no self-loop)writebdwith alllock_acq_enable=0(no locks)add_one_using_dma(staticaie.memtile_dmablock)So:
never delivers data to the shim S2MM receiver.
The bug is independent of self-looping
next_bdand independent of locks.Environment
amd7940hsrunner)NPU Phoenix,aie2, 6×5 topology)xrt-smi examine)c347d62)b37dc33d41The upstream CI Phoenix runner (
amd7940hs) appears to use/opt/ryzen_ai-1.3.0.1/vitis_aie_essentialsper the workflow logs.We're a few minor versions ahead on RyzenAI.
Confidence the bug is HW/FW-specific
through its own DMA model and reports
PASS!. So the lowering andtest logic are sound; only the on-silicon execution diverges.
add_one_using_dma, which exercises the same shim ↔ memtile ↔ shim flowbut programs the memtile DMA statically via the
aie.memtile_dmablock (encoded into CDO at xclbin-load time), passes on the same
hardware. Only the runtime-sequence-programmed path TDRs.
xrt-smi validatepasses; the device is healthy at SMI level.Questions
amd7940hsin upstream CI? Thevisible log only lists
UnsupportedandFailed; if they ran in thePassedcount, the count is the only evidence and it's notdiscriminating.
We suspect a regression between the firmware bundled with RyzenAI
1.3.0.1 (CI) and 1.5.x (us), since the only obvious environmental
delta is firmware.
accept a PR marking these as
XFAILonryzen_ai_npu1with areference to this issue, until the underlying firmware/runtime
issue is resolved?
Related
loosened these tests'
REQUIRESfromryzen_ai_npu1toryzen_ai,but the lowering change in that PR explicitly only applies to
AIE2p — the AIE2/NPU1 path is unchanged. So if the tests ever passed
on Phoenix, that wasn't via Fix memtile DMA BD address missing base offset #2893.
separately:
xrt::hw_context::read_aie_regreturns successfully forcompute-tile reads but never responds for memtile reads on the same
firmware version. The driver kills the user-context mailbox on the
resulting 5s timeout, which then cascades to a
drm_dev_unplugwedge during
modprobe -r. We can include details if that'spotentially related — it suggests memtile runtime access in
general is incompletely supported in Phoenix firmware 1.5.x.
Happy to provide the full lit/dmesg traces or any additional test
variations on request.