AIESW-29257: Fix xrt-smi device ID reporting#9716
AIESW-29257: Fix xrt-smi device ID reporting#9716chvamshi-xilinx merged 7 commits intoXilinx:masterfrom
Conversation
[why] xrt-smi JSON report showed "id": "00000000-0000-0000-0000-000000000000" for all Ryzen/NPU devices (Phoenix, Strix, Strix2, StrixH, Krackan, Medusa, Soundwave). The root cause: rom_time_since_epoch returns a hardcoded 0 for Ryzen devices instead of throwing, so the existing Alveo 1RP path silently set id="0". No Ryzen-specific path existed, leaving every NPU device with an identical, non-unique identifier in the JSON output. [how] Extract device ID generation into a dedicated get_device_id() function in XBUtilities.cpp. For Ryzen/NPU devices, derive the UUID from PCIe BDF (domain, bus, device, function) using the same memcpy byte-packing as hip_device_get_uuid(), ensuring the xrt-smi id is consistent with the HIP UUID API. Alveo 2RP (xclbin logic UUID) and 1RP (ROM timestamp) paths are retained as fallbacks for non-Ryzen devices. AIESW-29257
|
@sandilya-xilinx is not a repository collaborator. To proceed:
|
Replace C-style byte array and magic number indices with xuid_t + std::apply fold, eliminating all cppcoreguidelines-avoid-magic-numbers and cppcoreguidelines-avoid-c-arrays warnings. AIESW-29257
|
@sandilya-xilinx is not a repository collaborator. To proceed:
|
|
@sandilya-xilinx is not a repository collaborator. To proceed:
|
|
clang-tidy review says "All clean, LGTM! 👍" |
I, Sandilya Bhagi <sandilya.bhagi@amd.com>, hereby add my Signed-off-by to this commit: 498ab2f I, Sandilya Bhagi <sandilya.bhagi@amd.com>, hereby add my Signed-off-by to this commit: 8ecf10b I, Sandilya Bhagi <sandilya.bhagi@amd.com>, hereby add my Signed-off-by to this commit: fc560b2 Signed-off-by: Sandilya Bhagi <sandilya.bhagi@amd.com> Signed-off-by: Sandilya Bhagi <sandilya.bhagi@amd.com>
|
@sandilya-xilinx is not a repository collaborator. To proceed:
|
|
clang-tidy review says "All clean, LGTM! 👍" |
|
|
||
| // Alveo 2RP: UUID from loaded xclbin logic region | ||
| try { | ||
| auto logic_uuids = xrt_core::device_query<xrt_core::query::logic_uuids>(device); |
There was a problem hiding this comment.
Do we need the UUID logic capture from XCLBIN ?
IMO, let's restrict to ryzen devices cases only.
There was a problem hiding this comment.
The alveo code is existing code and now with newer code there is exclusive if/else for Ryzen vs Alveo
| } | ||
|
|
||
| static std::string | ||
| get_device_id(const std::shared_ptr<xrt_core::device>& device, |
There was a problem hiding this comment.
Can we reuse the same function as used by HIP? See hip_device_get_uuid(hipDevice_t device) defined in hip_device.cpp. Can that code be moved to some common place from where both HIP and xrt-smi can call it?
There was a problem hiding this comment.
Hi @sonals, Implemented Centralized BDF-to-UUID logic in pcie_bdf::to_uuid() [src/runtime_src/core/common/query_requests.h]. Please review
[why] UUID construction from PCIe BDF was duplicated between xrt-smi (XBUtilities.cpp) and HIP (hip_device.cpp). If the packing logic changes, both files would need updating independently. [how] Add pcie_bdf::to_uuid() static method in query_requests.h as the single source of truth for BDF-to-UUID conversion. Update hip_device_get_uuid() in hip_device.cpp to call it. Inline the device ID logic back into get_available_devices() in XBUtilities.cpp (removing get_device_id()) using pcie_bdf::to_uuid() for the Ryzen/NPU path, restoring the original 1RP/2RP inline structure for Alveo. AIESW-29257
I, Sandilya Bhagi <sandilya.bhagi@amd.com>, hereby add my Signed-off-by to this commit: 89e0497 Signed-off-by: Sandilya Bhagi <sandilya.bhagi@amd.com> Signed-off-by: Sandilya Bhagi <sandilya.bhagi@amd.com>
…f::to_uuid() Use braced initializer list instead of repeating the return type. AIESW-29257 Signed-off-by: Sandilya Bhagi <sandilya.bhagi@amd.com>
|
clang-tidy review says "All clean, LGTM! 👍" |
[why]
xrt-smi JSON report showed "id": "00000000-0000-0000-0000-000000000000" for all Ryzen/NPU devices (Phoenix, Strix, Strix2, StrixH, Krackan, Medusa, Soundwave). The root cause: rom_time_since_epoch returns a hardcoded 0 for Ryzen devices instead of throwing, so the existing Alveo 1RP path silently set id="0". No Ryzen-specific path existed, leaving every NPU device with an identical, non-unique identifier in the JSON output.
[how]
Extract device ID generation into a dedicated get_device_id() function in XBUtilities.cpp. For Ryzen/NPU devices, derive the UUID from PCIe BDF (domain, bus, device, function) using the same memcpy byte-packing as hip_device_get_uuid(), ensuring the xrt-smi id is consistent with the HIP UUID API. Alveo 2RP (xclbin logic UUID) and 1RP (ROM timestamp) paths are retained as fallbacks for non-Ryzen devices.
AIESW-29257
Problem solved by the commit
Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered
How problem was solved, alternative solutions (if any) and why they were rejected
Risks (if any) associated the changes in the commit
What has been tested and how, request additional testing if necessary
Documentation impact (if any)