-
Notifications
You must be signed in to change notification settings - Fork 240
cuda.core.system: affinity, clock, fan, temperature and thermals #1492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Copilot <[email protected]>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds extensive system management functionality to cuda.core.system, including device affinity management, clock control, fan management, temperature monitoring, and thermal settings. The changes span new implementation files for these features, comprehensive test coverage, and corresponding updates to the NVML bindings layer.
Changes:
- Added affinity, clock, fan, temperature, and thermal management APIs
- Refactored several NVML binding functions to improve API ergonomics (e.g.,
device_get_last_bbx_flush_timenow returns a tuple instead of using an out parameter) - Added new wrapper classes like
ClockInfo,FanInfo,Temperature,InforomInfo, etc.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| cuda_core/tests/system/test_system_device.py | Comprehensive tests for new device features including affinity, clocks, fans, temperature, and performance states |
| cuda_core/docs/source/api.rst | API documentation updates for new system types and functions |
| cuda_core/cuda/core/system/_temperature.pxi | New temperature and thermal sensor management implementation |
| cuda_core/cuda/core/system/_performance.pxi | New performance state (P-state) management implementation |
| cuda_core/cuda/core/system/_inforom.pxi | New InfoROM information access implementation |
| cuda_core/cuda/core/system/_fan.pxi | New fan control and monitoring implementation |
| cuda_core/cuda/core/system/_cooler.pxi | New cooler information access implementation |
| cuda_core/cuda/core/system/_clock.pxi | New clock management and monitoring implementation |
| cuda_core/cuda/core/system/_device.pyx | Extended Device class with new properties and methods; refactored constructor to use keyword-only arguments |
| cuda_bindings/cuda/bindings/_nvml.pyx | Refactored several API functions; removed duplicate enum entries; improved API signatures |
| cuda_bindings/cuda/bindings/_nvml.pxd | Updated function signatures to match implementation changes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| offsets = clock.get_offsets(pstate) | ||
| except system.InvalidArgumentError: | ||
| offsets = system.ClockOffsets(nvml.ClockOffset_v1()) |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to 'offsets' is unnecessary as it is redefined before this value is used.
| offsets = system.ClockOffsets(nvml.ClockOffset_v1()) | |
| pass |
This comment has been minimized.
This comment has been minimized.
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
cpcloud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| cdef class ThermalSensor: | ||
| cdef: | ||
| _ThermalSensor *_ptr | ||
| object _owner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: is it specified anywhere that this lifetime relationship needs to be maintained? IOW, would it be incorrect to just let the usual reference counting behavior handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add a comment. ptr is a pointer somewhere in the middle of the memory of a Numpy array held by _owner. So if _owner were to be decref'd, _ptr would be dangling.
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
Prerequisites: