Skip to content

Conversation

@xrwang8
Copy link
Contributor

@xrwang8 xrwang8 commented Sep 22, 2025

Summary

  • change nvml_to_cuda_map to return int so it can safely signal failure with -1
  • update the function declaration and callers to rely on the signed return value

This removes the implementation-defined behavior where an unsigned -1 could be interpreted as a huge positive index, bypassing the <0 guard and leading to out-of-bounds access when NVML devices are not
visible to CUDA (e.g., under CUDA_VISIBLE_DEVICES remapping).

@hami-robot
Copy link
Contributor

hami-robot bot commented Sep 22, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xrwang8
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robot hami-robot bot added the size/XS label Sep 22, 2025
@hami-robot hami-robot bot added size/S and removed size/XS labels Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants