webhook: inject CUDA_DEVICE_MEMORY_LIMIT from gpu-memory annotation#5
webhook: inject CUDA_DEVICE_MEMORY_LIMIT from gpu-memory annotation#5limes22 wants to merge 1 commit into
Conversation
KAI binder sets the gpu-memory annotation (MiB) on shared pods but never passes CUDA_DEVICE_MEMORY_LIMIT, which HAMi-core (libvgpu) reads to enforce the per-pod GPU memory cap. As a result libvgpu loads via ld.so.preload but enforces nothing (nvidia-smi shows full device memory) on KAI fractional-sharing pods. This makes the mutating webhook translate the gpu-memory annotation into CUDA_DEVICE_MEMORY_LIMIT=<value>m on every (init)container (skipping containers that already set it, and handling the empty-env case), so libvgpu enforces the requested cap. gpu-fraction carries no absolute memory value and is left untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits. 📝 Please follow instructions in the contributing guide to update your commits with the DCO Full details of the Developer Certificate of Origin can be found at developercertificate.org. The list of commits missing DCO signoff:
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: limes22 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @limes22! It looks like this is your first PR to Project-HAMi/KAI-resource-isolator 🎉 |
KAI binder sets the gpu-memory annotation (MiB) on shared pods but never passes CUDA_DEVICE_MEMORY_LIMIT, which HAMi-core (libvgpu) reads to enforce the per-pod GPU memory cap. As a result libvgpu loads via ld.so.preload but enforces nothing (nvidia-smi shows full device memory) on KAI fractional-sharing pods.
This makes the mutating webhook translate the gpu-memory annotation into CUDA_DEVICE_MEMORY_LIMIT=m on every (init)container (skipping containers that already set it, and handling the empty-env case), so libvgpu enforces the requested cap. gpu-fraction carries no absolute memory value and is left untouched.