Skip to content

XDP: skip run-lifecycle hooks for XDP_KERNEL* internal runs#9841

Open
jyothees99 wants to merge 1 commit into
Xilinx:masterfrom
jyothees99:xdp-internal-kernel-skip
Open

XDP: skip run-lifecycle hooks for XDP_KERNEL* internal runs#9841
jyothees99 wants to merge 1 commit into
Xilinx:masterfrom
jyothees99:xdp-internal-kernel-skip

Conversation

@jyothees99
Copy link
Copy Markdown
Collaborator

XDP plugins that submit their own ELF (e.g. aie_halt/ve2, aie_nop_util, and future plugin configurations) create an xrt::run with the kernel name "XDP_KERNEL:{IPUV1CNN}". The run-lifecycle XDP hooks (run_constructor / run_start / run_wait) would then attempt to instrument these internal runs through whatever plugin currently registers the callbacks (today aie_dtrace), which is incorrect and order-dependent on plugin load.

Add a tiny file-local is_xdp_internal_kernel() helper that does a prefix match on the (already colon-stripped) kernel name, and skip the dispatch at the top of each of the three top-level xrt_core::xdp run dispatchers. The check runs first so any future XDP plugin consumer added below the config gate inherits the same skip rule.

Problem solved by the commit

XDP run-lifecycle hooks (run_constructor / run_start / run_wait) fire on every xrt::run, including those created inside XDP plugins themselves (e.g. aie_halt/ve2 and aie_nop_util use xrt::ext::kernel{..., "XDP_KERNEL:{IPUV1CNN}"} to submit control-code ELFs). This makes XDP self-instrument its own internal runs. Skip the hooks for any run whose kernel name starts with XDP_KERNEL.

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

PR #9721 (commit 751273f) added the three run-lifecycle XDP hooks but did not filter out XDP-internal runs. Today the issue is masked by load order — aie_trace::updateAIEDevice calls submitNopElf before aie_dtrace is loaded, so the callback is still null. We caught the latent bug while planning to extend the ELF-submission pattern to all XDP plugin configurations, and confirmed it by temporarily calling submitNopElf at the end of AieDtracePlugin::updateAIEDtraceDevice (after dtrace is registered): without the fix, dtrace's hook fires on the nop ELF.

How problem was solved, alternative solutions (if any) and why they were rejected

Added a file-local is_xdp_internal_kernel() helper in src/runtime_src/core/common/xdp/profile.cpp that prefix-matches the kernel name against "XDP_KERNEL", called as the first check in each top-level dispatcher (run_constructor / run_start / run_wait). One file, one helper, three early-returns.

Alternatives rejected:
  • Per-plugin guard in each *_cb.cpp — every present and future plugin would have to remember to repeat the check.
  • Filter in the XDPPlugin base class via virtual *Impl overrides — structurally cleanest but a larger refactor than necessary.
  • Filter at the run_impl call sites in xrt_kernel.cpp — touches XRT core API for an XDP-only concern and spreads the same logic across three sites.
  • Refactor aie::dtrace::run_* to take a prepared xrt_kernel_data to avoid a duplicate fetch — micro-optimization not worth the signature churn.

Risks (if any) associated the changes in the commit

No

What has been tested and how, request additional testing if necessary

Tested a vaiml design

@jyothees99 jyothees99 requested review from chvamshi-xilinx and garimadhaked and removed request for rbramand-xilinx and stsoe May 29, 2026 10:38
@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

XDP plugins that submit their own ELF (e.g. aie_halt/ve2, aie_nop_util,
and future plugin configurations) create an xrt::run with the kernel
name "XDP_KERNEL:{IPUV1CNN}". The run-lifecycle XDP hooks
(run_constructor / run_start / run_wait) would then attempt to
instrument these internal runs through whatever plugin currently
registers the callbacks (today aie_dtrace), which is incorrect and
order-dependent on plugin load.

Add a tiny file-local is_xdp_internal_kernel() helper that does a
prefix match on the (already colon-stripped) kernel name, and skip
the dispatch at the top of each of the three top-level xrt_core::xdp
run dispatchers. The check runs first so any future XDP plugin
consumer added below the config gate inherits the same skip rule.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Jyotheeswar Ganne <Jyotheeswar.Ganne@amd.com>
@jyothees99 jyothees99 force-pushed the xdp-internal-kernel-skip branch from 1e07fd4 to f5a829f Compare May 29, 2026 10:51
@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Copy Markdown
Collaborator

@IshitaGhosh IshitaGhosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@IshitaGhosh IshitaGhosh requested a review from jvillarre May 29, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants