Skip to content

Add per-run CT file generation via run constructor hook (VE2)#35

Merged
jvillarre merged 15 commits intoXilinx:masterfrom
jyothees99:run-constructor-ct-hook
Apr 9, 2026
Merged

Add per-run CT file generation via run constructor hook (VE2)#35
jvillarre merged 15 commits intoXilinx:masterfrom
jyothees99:run-constructor-ct-hook

Conversation

@jyothees99
Copy link
Copy Markdown
Collaborator

Summary

  • Add aieProfileRunConstructor callback to generate unique CT files per xrt::run instead of once at xclbin-load time
  • VE2 implementation writes aie_profile_ctx_run.ct and attaches it via xrt::run::set_dtrace_control_file()
  • Refactor AieProfileCTWriter to support caller-specified output paths while preserving backward compatibility
  • Remove device-level CT generation and dtrace_control_file_path config setting from updateDevice()

Details

Previously, a single aie_profile.ct file was generated during updateDevice() and set globally via the config reader. This meant all runs shared the same CT file, and the path had to be set before module creation.

With this change, XRT core calls aieProfileRunConstructor(void* run, void* hwctx, uint32_t run_uid) from the xrt::run constructor. The plugin looks up the hw_context in handleToAIEProfileImpl, and the VE2 implementation generates a unique CT file per run and attaches it directly to the run object. The run UID matches the dtrace dump naming convention (dtrace_dump_ctx_run_.py).

Other platforms (Edge, x86, Win) inherit a no-op from the base class.

Required xrt changes https://github.com/jyothees99/XRT/tree/run-constructor-ct-hook

return;

auto ctx = xrt_core::hw_context_int::create_hw_context_from_implementation(hwctx);
auto slotIdx = static_cast<xrt_core::hwctx_handle*>(ctx)->get_slotidx();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slot Index will not work for temporal sharing cases. We need to verify it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As first phase, we can still go ahead with it. But please make a note that we have to revisit it.

Generate unique CT files per xrt::run instead of once at xclbin-load
time. The plugin registers an aieProfileRunConstructor callback via
dlsym, invoked from the xrt::run constructor. VE2 overrides
generateCTForRun to write aie_profile_ctx_<slot>_run_<id>.ct and
attach it via set_dtrace_control_file. The existing device-level CT
generation in updateDevice is removed.

Made-with: Cursor
Accept uint32_t run_uid from the core hook and use it in the
per-run CT filename (aie_profile_ctx_<slot>_run_<uid>.ct) instead
of a local static counter. This matches the dtrace dump naming
convention and correlates CT and dump files for the same run.

Made-with: Cursor
The generated Python code now introspects __file__ at runtime to
extract the unique dtrace dump ID, producing a correlated output
filename like aie_profile_counters_ctx_1_run_3_<timestamp>.json.

Made-with: Cursor
…through

Thread kernel_name and elf_handle through the run_constructor XDP callback
chain so that generateCTForRun can extract SAVE_TIMESTAMP line numbers
directly from the ELF via aiebu_assembler::get_op_locations, falling back
to the CSV-based approach when the aiebu API is unavailable or fails.

- Add elf_helper.h/cpp to isolate ELFIO serialization and avoid
  system <elf.h> macro conflicts
- Add generate() overload to AieProfileCTWriter accepting op_loc data
- Link aiebu_static and add aiebu include path in CMakeLists.txt

Made-with: Cursor
The elf_int.h header includes <elfio/elfio.hpp> which lives under
core/common/elf/. Add that directory to the include search path.

Made-with: Cursor
@jyothees99 jyothees99 force-pushed the run-constructor-ct-hook branch from 9b48b3b to 3330403 Compare April 2, 2026 07:39
jyothees99 and others added 10 commits April 2, 2026 01:57
…culation

Query the hw_context partition's start_col via getAIEPartitionInfo at the
generateCTForRun call site and pass it as a plain uint8_t to AieProfileCTWriter.
calculateCounterAddress now adds partitionStartCol to the relative column from
the perf counter vector so that addresses in the generated CT file reflect the
actual hardware column placement chosen by XRT.

Made-with: Cursor
- calculateCounterAddress now uses the partition-relative column directly
  instead of adding partitionStartCol, so all addresses emitted in the CT
  file are partition-relative (matching the ELF map data column values used
  by the dtrace parser).
- generateCTForRun now returns early if dtrace_debug is not enabled, so CT
  generation and set_dtrace_control_file are only performed when the feature
  is explicitly requested.

Made-with: Cursor
Add the aie_dtrace plugin (VE2 shim, CT writer, callbacks, CMake) for
Debug.aie_dtrace-driven bandwidth and control-trace workflows without
the AIE profile CSV path. Wire the plugin into profile/plugin CMake.

Extend AieProfileMetadata with AIE_dtrace_settings parsing and dtrace-
specific helpers. Adjust the AIE profile plugin to load the dtrace
library when enabled.

Align AIE profile VE2 updateDevice with nop.elf plus setMetricsSettings
and keep the xclbin profile-counter fallback using get_userpf_device
and convertToCoreDevice for hw-context handles.

Made-with: Cursor
Remove AIE_profile_settings.dtrace_debug usage: gate generateCTForRun and
runConstructorHook on get_aie_dtrace(). Drop dtrace_debug from allowed
AIE_profile_settings and remove the related configuration warning.

Users should set Debug.aie_dtrace=true instead of the removed profile flag.

Made-with: Cursor
Replace per-probe @blockopen JSON blocks with a single # COUNTER_METADATA_BEGIN/END
comment block in the begin{} section. Jprobes now emit ts_<asmId> for
ordering-independent multi-ASM identification and use _ = read_reg() to
suppress counter variable assignments, reducing dtrace dump file size.

Made-with: Cursor
Export aieDtraceRunStart and aieDtraceRunWait from the plugin and add
AieDtracePlugin::runStartHook / runWaitHook stubs for use when XRT
invokes per-run start and wait callbacks.

Made-with: Cursor
- Add ve2/elf_helper.h/cpp to aie_dtrace (mirrors aie_profile/ve2/elf_helper)
  so the plugin no longer depends on the profile build path
- Update CMakeLists.txt to build the local elf_helper.cpp
- Pass allCounters to writeCTFile and emit device-wide counter_metadata
  block in CT begin section (same fields as AieProfileCTWriter)
- Fix header guard to AIE_DTRACE_CT_WRITER_H
- Replace all "AIE Profile:" log messages with "AIE dtrace:"

Made-with: Cursor
… export

Remove AieProfileCTWriter and aiebu linkage from xdp_aie_profile_plugin_xdna.
Strip shim-specific ddr_bandwidth/read_bandwidth/write_bandwidth reservation
logic from AieProfile_VE2Impl::setMetricsSettings. Remove runConstructorHook
and aieProfileRunConstructor from the profile plugin; dtrace keeps its own CT
path and bandwidth metrics. Update CMake and comments accordingly.

Made-with: Cursor
- Track op_loc lineinfo.col min/max per control asm; sort UCs by start column.
- Set colStart/ucNumber from aiebu; colEnd from next UC start minus one, or
  opLocMaxCol for the last UC before counter-based extension.
- Extend the last UC colEnd to the maximum configured counter column so tiles
  without SAVE_TIMESTAMPS in the dump still get read_reg entries (CSV path too).
- Sort CSV-loaded asm files by colStart for consistent ordering.

Made-with: Cursor
@jvillarre jvillarre merged commit d544f85 into Xilinx:master Apr 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants