Add per-run CT file generation via run constructor hook (VE2)#35
Merged
jvillarre merged 15 commits intoXilinx:masterfrom Apr 9, 2026
Merged
Add per-run CT file generation via run constructor hook (VE2)#35jvillarre merged 15 commits intoXilinx:masterfrom
jvillarre merged 15 commits intoXilinx:masterfrom
Conversation
IshitaGhosh
reviewed
Mar 24, 2026
| return; | ||
|
|
||
| auto ctx = xrt_core::hw_context_int::create_hw_context_from_implementation(hwctx); | ||
| auto slotIdx = static_cast<xrt_core::hwctx_handle*>(ctx)->get_slotidx(); |
Collaborator
There was a problem hiding this comment.
Slot Index will not work for temporal sharing cases. We need to verify it.
Collaborator
There was a problem hiding this comment.
As first phase, we can still go ahead with it. But please make a note that we have to revisit it.
Generate unique CT files per xrt::run instead of once at xclbin-load time. The plugin registers an aieProfileRunConstructor callback via dlsym, invoked from the xrt::run constructor. VE2 overrides generateCTForRun to write aie_profile_ctx_<slot>_run_<id>.ct and attach it via set_dtrace_control_file. The existing device-level CT generation in updateDevice is removed. Made-with: Cursor
Accept uint32_t run_uid from the core hook and use it in the per-run CT filename (aie_profile_ctx_<slot>_run_<uid>.ct) instead of a local static counter. This matches the dtrace dump naming convention and correlates CT and dump files for the same run. Made-with: Cursor
The generated Python code now introspects __file__ at runtime to extract the unique dtrace dump ID, producing a correlated output filename like aie_profile_counters_ctx_1_run_3_<timestamp>.json. Made-with: Cursor
…through Thread kernel_name and elf_handle through the run_constructor XDP callback chain so that generateCTForRun can extract SAVE_TIMESTAMP line numbers directly from the ELF via aiebu_assembler::get_op_locations, falling back to the CSV-based approach when the aiebu API is unavailable or fails. - Add elf_helper.h/cpp to isolate ELFIO serialization and avoid system <elf.h> macro conflicts - Add generate() overload to AieProfileCTWriter accepting op_loc data - Link aiebu_static and add aiebu include path in CMakeLists.txt Made-with: Cursor
The elf_int.h header includes <elfio/elfio.hpp> which lives under core/common/elf/. Add that directory to the include search path. Made-with: Cursor
9b48b3b to
3330403
Compare
…culation Query the hw_context partition's start_col via getAIEPartitionInfo at the generateCTForRun call site and pass it as a plain uint8_t to AieProfileCTWriter. calculateCounterAddress now adds partitionStartCol to the relative column from the perf counter vector so that addresses in the generated CT file reflect the actual hardware column placement chosen by XRT. Made-with: Cursor
- calculateCounterAddress now uses the partition-relative column directly instead of adding partitionStartCol, so all addresses emitted in the CT file are partition-relative (matching the ELF map data column values used by the dtrace parser). - generateCTForRun now returns early if dtrace_debug is not enabled, so CT generation and set_dtrace_control_file are only performed when the feature is explicitly requested. Made-with: Cursor
Add the aie_dtrace plugin (VE2 shim, CT writer, callbacks, CMake) for Debug.aie_dtrace-driven bandwidth and control-trace workflows without the AIE profile CSV path. Wire the plugin into profile/plugin CMake. Extend AieProfileMetadata with AIE_dtrace_settings parsing and dtrace- specific helpers. Adjust the AIE profile plugin to load the dtrace library when enabled. Align AIE profile VE2 updateDevice with nop.elf plus setMetricsSettings and keep the xclbin profile-counter fallback using get_userpf_device and convertToCoreDevice for hw-context handles. Made-with: Cursor
Remove AIE_profile_settings.dtrace_debug usage: gate generateCTForRun and runConstructorHook on get_aie_dtrace(). Drop dtrace_debug from allowed AIE_profile_settings and remove the related configuration warning. Users should set Debug.aie_dtrace=true instead of the removed profile flag. Made-with: Cursor
Replace per-probe @blockopen JSON blocks with a single # COUNTER_METADATA_BEGIN/END
comment block in the begin{} section. Jprobes now emit ts_<asmId> for
ordering-independent multi-ASM identification and use _ = read_reg() to
suppress counter variable assignments, reducing dtrace dump file size.
Made-with: Cursor
Export aieDtraceRunStart and aieDtraceRunWait from the plugin and add AieDtracePlugin::runStartHook / runWaitHook stubs for use when XRT invokes per-run start and wait callbacks. Made-with: Cursor
…s99/XDP into run-constructor-ct-hook
- Add ve2/elf_helper.h/cpp to aie_dtrace (mirrors aie_profile/ve2/elf_helper) so the plugin no longer depends on the profile build path - Update CMakeLists.txt to build the local elf_helper.cpp - Pass allCounters to writeCTFile and emit device-wide counter_metadata block in CT begin section (same fields as AieProfileCTWriter) - Fix header guard to AIE_DTRACE_CT_WRITER_H - Replace all "AIE Profile:" log messages with "AIE dtrace:" Made-with: Cursor
… export Remove AieProfileCTWriter and aiebu linkage from xdp_aie_profile_plugin_xdna. Strip shim-specific ddr_bandwidth/read_bandwidth/write_bandwidth reservation logic from AieProfile_VE2Impl::setMetricsSettings. Remove runConstructorHook and aieProfileRunConstructor from the profile plugin; dtrace keeps its own CT path and bandwidth metrics. Update CMake and comments accordingly. Made-with: Cursor
- Track op_loc lineinfo.col min/max per control asm; sort UCs by start column. - Set colStart/ucNumber from aiebu; colEnd from next UC start minus one, or opLocMaxCol for the last UC before counter-based extension. - Extend the last UC colEnd to the maximum configured counter column so tiles without SAVE_TIMESTAMPS in the dump still get read_reg entries (CSV path too). - Sort CSV-loaded asm files by colStart for consistent ordering. Made-with: Cursor
jvillarre
approved these changes
Apr 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Details
Previously, a single aie_profile.ct file was generated during updateDevice() and set globally via the config reader. This meant all runs shared the same CT file, and the path had to be set before module creation.
With this change, XRT core calls aieProfileRunConstructor(void* run, void* hwctx, uint32_t run_uid) from the xrt::run constructor. The plugin looks up the hw_context in handleToAIEProfileImpl, and the VE2 implementation generates a unique CT file per run and attaches it directly to the run object. The run UID matches the dtrace dump naming convention (dtrace_dump_ctx_run_.py).
Other platforms (Edge, x86, Win) inherit a no-op from the base class.
Required xrt changes https://github.com/jyothees99/XRT/tree/run-constructor-ct-hook