Skip to content

webhook: inject CUDA_DEVICE_MEMORY_LIMIT from gpu-memory annotation#5

Open
limes22 wants to merge 1 commit into
Project-HAMi:mainfrom
limes22:fix/inject-cuda-device-memory-limit
Open

webhook: inject CUDA_DEVICE_MEMORY_LIMIT from gpu-memory annotation#5
limes22 wants to merge 1 commit into
Project-HAMi:mainfrom
limes22:fix/inject-cuda-device-memory-limit

Conversation

@limes22

@limes22 limes22 commented Jun 8, 2026

Copy link
Copy Markdown

KAI binder sets the gpu-memory annotation (MiB) on shared pods but never passes CUDA_DEVICE_MEMORY_LIMIT, which HAMi-core (libvgpu) reads to enforce the per-pod GPU memory cap. As a result libvgpu loads via ld.so.preload but enforces nothing (nvidia-smi shows full device memory) on KAI fractional-sharing pods.

This makes the mutating webhook translate the gpu-memory annotation into CUDA_DEVICE_MEMORY_LIMIT=m on every (init)container (skipping containers that already set it, and handling the empty-env case), so libvgpu enforces the requested cap. gpu-fraction carries no absolute memory value and is left untouched.

KAI binder sets the gpu-memory annotation (MiB) on shared pods but never passes
CUDA_DEVICE_MEMORY_LIMIT, which HAMi-core (libvgpu) reads to enforce the per-pod
GPU memory cap. As a result libvgpu loads via ld.so.preload but enforces nothing
(nvidia-smi shows full device memory) on KAI fractional-sharing pods.

This makes the mutating webhook translate the gpu-memory annotation into
CUDA_DEVICE_MEMORY_LIMIT=<value>m on every (init)container (skipping containers
that already set it, and handling the empty-env case), so libvgpu enforces the
requested cap. gpu-fraction carries no absolute memory value and is left untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hami-robot hami-robot Bot requested a review from archlitchi June 8, 2026 21:56
@hami-robot

hami-robot Bot commented Jun 8, 2026

Copy link
Copy Markdown

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

  • a8e7c74 webhook: inject CUDA_DEVICE_MEMORY_LIMIT from gpu-memory annotation
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hami-robot

hami-robot Bot commented Jun 8, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: limes22
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robot

hami-robot Bot commented Jun 8, 2026

Copy link
Copy Markdown

Welcome @limes22! It looks like this is your first PR to Project-HAMi/KAI-resource-isolator 🎉

@hami-robot hami-robot Bot added the size/M label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant