Skip to content

Fix Windows crashes in Python XRT cached runtime#3100

Open
thomthehound wants to merge 4 commits into
Xilinx:mainfrom
thomthehound:hostruntime-fix
Open

Fix Windows crashes in Python XRT cached runtime#3100
thomthehound wants to merge 4 commits into
Xilinx:mainfrom
thomthehound:hostruntime-fix

Conversation

@thomthehound
Copy link
Copy Markdown
Contributor

This fixes a high-severity issue in the Windows Python XRT runtime path. The cached runtime previously destroyed XRT objects in an unsafe order, which caused access violations on Windows and made the Python wrapper unreliable on that platform.

This was an instruction BO lifetime problem. hostruntime.py could (and sometimes deliberately) let an instruction BO outlive the kernel / hw_context state it depended upon. Linux tolerates that ordering, but Windows does not. This patch makes the ownership explicit and destroys cached XRT objects in the same dependency order that is already enforced by RAII when using C++. This correctness/hygiene fix does not change observable Linux behavior, but it is required for reliable Windows Python execution.

It also tightens the cache bookkeeping around context eviction, removes platform-specific assumptions, and makes runtime-owned instruction BOs use the runtime's existing XRT device instead of creating a fresh xrt.device(0) so ownership can be tracked correctly.

Changes

  • Track which cached runtime entry owns each instruction BO.
  • Destroy instruction BOs before their owning kernel / hw_context.
  • Allocate runtime-owned instruction BOs from the runtime's existing XRT device.
  • Retry hw_context creation after evicting cached contexts when context slots are exhausted.
  • Update the cache-fill test so it tests cache policy, not the machine's practical hw_context limit.
  • Use np.asarray(...) instead of np.array(..., copy=False) for NumPy 2.x compatibility.

Tests

I added C++ lit coverage for the lifetime-ordering issue:

  • normal RAII destruction order passes
  • stale input/output BOs also pass
  • stale instruction BO destruction reproduces the Windows access violation and is marked XFAIL on Windows but should pass on Linux.

These tests depend on #3075; without it, they fail due to lit syntax drift.

Signed-off-by: thomthehound <thomthehound@gmail.com>
Signed-off-by: thomthehound <thomthehound@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants