Peano lit enablement No1#3075
Conversation
Signed-off-by: thomthehound <thomthehound@gmail.com>
Signed-off-by: thomthehound <thomthehound@gmail.com>
|
Converting this to draft. The new scratchpad test introduced a few hours ago is giving me trouble parsing the new custom op. |
|
I think I have it nailed down now. The wheel wasn't as fresh as I expected it to be, so I needed to recompile Regardless, looking at the new test gave me another idea to implement while I futzed around. Just a few more hours and I'll have this ready for review. |
Signed-off-by: thomthehound <thomthehound@gmail.com>
Signed-off-by: thomthehound <thomthehound@gmail.com>
|
|
Interesting to hear about the lifetime/ownership issues for pyxrt! Is that for input/output buffers, or just the buffer used internally for the instruction bo, etc. associated with the hardware context? |
The funny thing is, I don't see anything in upstream My interpretation is that this is probably something specific to the Windows runtime/driver. But the source code for that isn't available to me, so I had to reason about it experimentally. To answer your question, though: pretty much every time, the trigger I observed was eviction/destruction of the cached instruction BO. I did not personally observe problems with the input/output buffers, but I also don't have enough visibility into the closed-source side of Windows to say that could never be an issue, and I didn't try to force it as a negative test, either. That's a great question, though. I'll try to probe around more regarding this issue while I finalize my |
|
I don't think I've ever been so excited to see errors on the CI before (and, really, I am truly charmed when it comes to CI luck). Both of these are actually hugely helpful. @hunhoffe A new wild transient has appeared! I will be modifying The other useful piece of information gained here is that my float32 error in |
Signed-off-by: thomthehound <thomthehound@gmail.com>
|
|
@erwei-xilinx I'm sorry to bother you, but can you kick this thing back off again for me? One of these days I need to talk to the management about becoming a collaborator around here... |
@hunhoffe I investigated this more thoroughly, and I can say with a very high degree of confidence now that input/output BOs are not affected by this issue; the instruction BOs are the problem (tested with code configured for arbitrary destruction ordering). I was also able to replicate the same errors on Windows using C++ code that forces destruction in the same order that
Edit: I might have to hold off on the lit tests, given I've slightly altered their preferred syntax here and I don't want to have to drift against my own recommendations. |
|
@thomthehound Thanks for digging into this! And for keeping an eye out for legendary device not found errors! Re: float32 errors, we've had some issues with stack overflows when peano uses more stack space than expected, particularly for float32 JIT functions. Estimating/checking the stack size requirements is a bit of a sticky problem (see: #2347) so we've disabled some of the float32 tests for now for that reason. Not sure if that's related to your errors or not. |
No problem! I thought it was a great question! And thank you for the reference regarding the float32 bug. From what you just said, if that isn't directly related, it's at least a next-door neighbor. I really appreciate that. I'll look into this more tomorrow. But, for a first Peano pass, I think this current work is done. I did not intend this to actually fix issues; just to identify them. I'd just like @jgmelber to take a look, because I think this is kind of thing that would interest him. |
|
@thomthehound Sorry for the delay! I think either @jackl-xilinx, @erwei-xilinx, or @jgmelber is going to do some windows environment setup trials but I don't think either of them will have time to address this until mid-next week at earliest. I do think this is a good contribution, and I've definitely not forgotten that this is waiting in the queue. ... if you happen to be looking for something to do.... I'm hoping to put out a RFC issue later today about some compile/build/jit/programming example consolidation I'm working on. Here's an exploratory notebook: https://github.com/Xilinx/mlir-aie/blob/ae4cb4e131de1cc9a4d21eca977b5f5c3b011f2b/programming_guide/whats_new_in_unify_compilation_workflow.ipynbf (WIP PR is here: #3025. although I don't plan to merge it in one piece). I'd be interested to see if you think my proposed changes will play well with Windows. I'll post the official issue with a better writeup soon! |
No worries! It's just always hard as an outsider to know what's going on. I was just checking in. I appreciate the update! That work you pointed out looks impressive, exciting and... substantial. I will try to come up with some tests for it against the tree as it stands as well as my backlog of Windows work. From my preliminary look, I don't see anything obviously breaking on the Windows-side, but I did run into some JIT issues while working on my upcoming programming_examples refactor. I think I can probably spin off the institutional fixes for those into a tiny pre-patch. |
This updates the lit configuration files (and affected test files) so that AIE/NPU tests can run with the open-source Peano backend. These changes are compatible with (and were tested on) Windows, but are not exclusive to it. Running the tests with the Peano backend should now be possible on Linux as well.
Highlights:
npu-xrttests on lit features (peano,chess,aietools_aie2,aietools_aie2p) instead of raw Vitis state.%backend_flagssubstitution that keeps the existing Chess path when Chess is available, but automatically expands to--no-xchesscc --no-xbridgewhen only Peano is present.clang ... -lrt -lstdc++command lines with%host_clangand%host_link_flags.|&, stdinechopipelines, Makefile-based tests, and extensionless executables.LibXAIE_${AIE_RUNTIME_TEST_TARGET}_DIRvariable.This is intended as lit enablement only. Experimentation has demonstrated some separate Python host runtime ownership/lifetime quirks -- at least on Windows -- that still require additional work.
Test command conventions
I am recommending several new cross-platform hygiene standards here for AIE/NPU test commands. Ideally, these can be extended to guides, examples, and other parts of the tree later. My intent is two-fold:
PATH, POSIX shell behavior, or Linux-only linker flags and--no-xbridgeas a further example in addition to the present case).For example, tests should prefer:
%aiecc %backend_flags ...for compiler-driver invocations%host_clang ... %host_link_flags ...for host test binaries%PYTHON ...for Python scripts.exenames for host executables used by lit (this is safe on Linux and required for Windows)peano,chess,aietools_aie2,aietools_aie2p, andxrtfor gatingAnd, in future tests, guides, examples, and scripts, etc., it would be best to avoid:
aieccorclangaiecc.py(in most cases)--no-xchesscc --no-xbridge(%backend_flagssupplies this when needed)-lstdc++/-lrt(%host_link_flagssupplies this)|&, stdinechopipelines, Makefile-only test drivers, and extensionless executable assumptionsThe shared substitutions used by the files touched in this PR are defined in
lit_config_helpers.py.Reproduction
On Windows, the tests may be exercised locally using the following PowerShell commands. The absolute path of an extracted copy of MLIR must be supplied for
$mlir:And the equivalent
cmdcommands from anx64 Native Tools Command Prompt:(!! IMPORTANT: You must be running Python 3.13 for this to work on Windows, because that is the only version supported by
pyxrt.pydin the XRT SDK !!)Test summary (tail):
At this point, the remaining failures may be triaged into the following four categories, all of which are distinct from lit test configuration issues:
Full-ELF/XRT coverage
Six (6) failures are in
loadpdi, reconfiguration, and scratchpad-register-write tests that exercise full-ELF attachment through XRT. As tested on Windows, attaching most ELFs to ahw_contextworks fine, but these particular kinds of full-ELFs do not yet appear to be supported end-to-end, and I suspect that is because they are written in such a way that they touch the firmware. These tests are left ungated for now, but I think it is unlikely they will ever fully work in a signed-driver environment.Python host-runtime ownership/lifetime behavior
Twelve (12) tests fail due to object lifetime ordering concerning
hw_context-attached resources. Specifically, the Windows runtime is sensitive to destroying ahw_contextbefore attached objects, such as BOs, which usually causes a seg-fault. These are correctable through hardening lifetime/ownership inhostruntime.py. I've prototyped that separately and will submit the fixes if/after this PR is merged. Frankly, the destruction order (attachments first,hw_contextlast) that Windows forces makes more intuitive sense to me, anyway.Context-cache stress behavior
test_cached_xrt_runtime.pystill has one (1) failure involving the creation of an additionalhw_contextwhen the configured cache limit is reached. This likely needs separate investigation of the cache eviction semantics.Remaining generated-code/runtime correctness issue
test_algorithms.pystill has one (1) scalar float32 add correctness failure, where-in one 16-element ObjectFifo/data-tile region (corresponding to output entries 96-111) is observed as zeroed. The root cause of this is not yet clear to me, but may involve either Peano-specific code-generation or our own runtime memory handling. In this specific case, the affected ObjectFifo buffer for data-tile 6 is allocated adjacent to the worker stack region, and my testing has indicated that increasingstack_sizeto0x0800(from the default0x0400) resolves the symptom but not the underlying cause. Altering the instruction to multiply, which uses libcalls, also prevents the corruption. This is consistent with (but potentially not exclusive to) a stack-pressure or spill-related mechanism: Peano’s AIE2P lowering custom-legalizes scalar float32 adds/subs through vector floating-point operations, which use zero-padded ACC2048 registers (and corresponding stack spill/reload machinery). However, other mechanisms cannot yet be ruled out. That is as far as I got on this one. It's a stumper.6 + 12 + 1 + 1 = 20All failures are accounted.