[AMD][DRAFT] PC Sampling, wave stall reasonings by ZelboK · Pull Request #10020 · triton-lang/triton

ZelboK · 2026-04-13T16:54:34Z

Not ready for review. Branches off from unmerged PR 9704.

Replace the deprecated roctracer-based profiling backend with a new implementation built on rocprofiler-sdk, using late-start via rocprofiler_force_configure so no LD_PRELOAD or tool-library preloading is required. Key changes: - Add RocprofSDKProfiler with a two-context architecture: * codeObjectContext (always active): lightweight callback for kernel_id -> name registration as code objects are loaded. * profilingContext (on-demand): HIP runtime API callback tracing and buffer-based kernel dispatch tracing, started in doStart() and stopped in doStop() to match Proton's start/stop idiomatics. - Eagerly call force_configure at time on AMD so interception hooks are installed before any HSA queues are created. Both contexts are registered at this point, causing the SDK to install queue hooks. Only the lightweight codeObjectContext is activated immediately. - Rewrite _select_backend() to infer the backend from the registered backends dict rather than calling get_current_target(), which would trigger HIP runtime init before force_configure can run. - Wire up ROCTx marker tracing via libroctx64's native callback API (roctxRegisterTracerCallback) since rocprofiler-sdk's marker service requires its replacement roctx library, unavailable with late-start. - Add RocprofApi dispatch layer (ExternLibRocprofiler) for runtime dlopen/dlsym of librocprofiler-sdk.so, with optional path override via TRITON_ROCPROFILER_SDK_LIB_PATH. - Update CMake to discover rocprofiler-sdk headers and plumb ROCPROFILER_SDK_INCLUDE_DIR into the build.

…), getKernelName Fix using shared lock instead of two lock acquis, simplified no correlation path, missing capture counting api. chagnes to see if nvidia CI runner works

…easons Implement stochastic PC sampling via rocprofiler-sdk, fix a process- abort-on-exit caused by dual ROCm library loading, and replace the NVIDIA-approximated stall reason mapping with proper AMD-native names. PC sampling: - Wire up rocprofiler_configure_pc_sampling_service with stochastic method and configurable interval (PROTON_PC_SAMPLING_INTERVAL env). - Add pcSamplingBufferCallback to accumulate per-kernel samples and flush them into Proton's Data/Metric pipeline. - Expose pcsampling mode for the rocprofiler backend in Python. Library dispatch fix: - Replace RTLD_NOLOAD versioned probes in Dispatch::init with dl_iterate_phdr-based lookup (findLoadedLib) so the pip-installed ROCm libraries are reused instead of pulling in a second copy from /opt/rocm-*/lib/. - Pre-populate ExternLibRocprofiler::lib from the pip installation directory before forceConfigure, preventing SONAME deduplication from silently substituting the system SDK. AMD stall reasons: - Add 9 AMD-specific PCSamplingMetricKind entries derived from rocprofiler-sdk pc_sampling.h (e.g. waitcnt, alu_dependency, arbiter_win_ex_stall) so output uses accurate hardware names instead of force-fitting into NVIDIA stall columns.

Your Name and others added 13 commits March 12, 2026 18:47

Apply pre-commit formatting fixes (clang-format, yapf)

144f7ff

graph destroy clean up, selective hip api subscription(may need audit…

11b8177

…), getKernelName Fix using shared lock instead of two lock acquis, simplified no correlation path, missing capture counting api. chagnes to see if nvidia CI runner works

see if ci fixed

8617215

Add roctracer fallback and address comments

054b2a9

camelCase format

364f92c

Merge branch 'main' into feat/rocprofiler_sdk_late_start

b75abc2

save

b963710

comment

058e04b

fmt

af313ec

Merge branch 'main' into feat/pc_sampling

435b039

remove tracedata hack

12c64e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][DRAFT] PC Sampling, wave stall reasonings#10020

[AMD][DRAFT] PC Sampling, wave stall reasonings#10020
ZelboK wants to merge 13 commits intotriton-lang:mainfrom
ZelboK:feat/pc_sampling

ZelboK commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZelboK commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant