Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Jan 14, 2026

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 14, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom
Copy link
Contributor Author

mdboom commented Jan 14, 2026

/ok to test

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds extensive system management functionality to cuda.core.system, including device affinity management, clock control, fan management, temperature monitoring, and thermal settings. The changes span new implementation files for these features, comprehensive test coverage, and corresponding updates to the NVML bindings layer.

Changes:

  • Added affinity, clock, fan, temperature, and thermal management APIs
  • Refactored several NVML binding functions to improve API ergonomics (e.g., device_get_last_bbx_flush_time now returns a tuple instead of using an out parameter)
  • Added new wrapper classes like ClockInfo, FanInfo, Temperature, InforomInfo, etc.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
cuda_core/tests/system/test_system_device.py Comprehensive tests for new device features including affinity, clocks, fans, temperature, and performance states
cuda_core/docs/source/api.rst API documentation updates for new system types and functions
cuda_core/cuda/core/system/_temperature.pxi New temperature and thermal sensor management implementation
cuda_core/cuda/core/system/_performance.pxi New performance state (P-state) management implementation
cuda_core/cuda/core/system/_inforom.pxi New InfoROM information access implementation
cuda_core/cuda/core/system/_fan.pxi New fan control and monitoring implementation
cuda_core/cuda/core/system/_cooler.pxi New cooler information access implementation
cuda_core/cuda/core/system/_clock.pxi New clock management and monitoring implementation
cuda_core/cuda/core/system/_device.pyx Extended Device class with new properties and methods; refactored constructor to use keyword-only arguments
cuda_bindings/cuda/bindings/_nvml.pyx Refactored several API functions; removed duplicate enum entries; improved API signatures
cuda_bindings/cuda/bindings/_nvml.pxd Updated function signatures to match implementation changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

try:
offsets = clock.get_offsets(pstate)
except system.InvalidArgumentError:
offsets = system.ClockOffsets(nvml.ClockOffset_v1())
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'offsets' is unnecessary as it is redefined before this value is used.

Suggested change
offsets = system.ClockOffsets(nvml.ClockOffset_v1())
pass

Copilot uses AI. Check for mistakes.
@github-actions

This comment has been minimized.

@mdboom
Copy link
Contributor Author

mdboom commented Jan 14, 2026

/ok to test

@mdboom mdboom self-assigned this Jan 15, 2026
@mdboom mdboom added cuda.core Everything related to the cuda.core module feature New feature or request labels Jan 15, 2026
@mdboom
Copy link
Contributor Author

mdboom commented Jan 15, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Jan 15, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Jan 15, 2026

/ok to test

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

cdef class ThermalSensor:
cdef:
_ThermalSensor *_ptr
object _owner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: is it specified anywhere that this lifetime relationship needs to be maintained? IOW, would it be incorrect to just let the usual reference counting behavior handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add a comment. ptr is a pointer somewhere in the middle of the memory of a Numpy array held by _owner. So if _owner were to be decref'd, _ptr would be dangling.

@mdboom mdboom marked this pull request as ready for review January 16, 2026 14:39
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 16, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom
Copy link
Contributor Author

mdboom commented Jan 16, 2026

/ok to test

@mdboom mdboom enabled auto-merge (squash) January 16, 2026 14:46
@mdboom
Copy link
Contributor Author

mdboom commented Jan 16, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Jan 16, 2026

/ok to test

@mdboom mdboom merged commit ce333b6 into NVIDIA:main Jan 16, 2026
82 checks passed
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants