-
Notifications
You must be signed in to change notification settings - Fork 313
fix: clear to-allocate annotations after successful device binding #1104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: clear to-allocate annotations after successful device binding #1104
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Kevinz857 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @Kevinz857! It looks like this is your first PR to Project-HAMi/HAMi 🎉 |
- Clear hami.io/vgpu-devices-to-allocate and other to-allocate annotations when device binding succeeds - Add scheduler protection to skip processing successfully bound pods - Fix issue Project-HAMi#987 where pods retained to-allocate annotations causing scheduler confusion - Update tests to verify annotation clearing behavior This prevents Kubernetes 1.20 scheduler from repeatedly processing already allocated pods, resolving UUID mismatches and SchedulerError events. Signed-off-by: Kevinz857 <[email protected]>
b9cb958
to
9589595
Compare
Codecov ReportAttention: Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
i get it, sometimes, a v1.20 scheduler process a pod several times(which is perhaps a bug in v1.20),but by erasing 'to-allocate' allocation, we will make sure only one GPU is binded successfully. |
if bindPhase, exists := pod.Annotations[util.DeviceBindPhase]; exists && bindPhase == util.DeviceBindSuccess { | ||
klog.V(5).InfoS("Skipping successfully bound pod to prevent scheduler confusion", "pod", pod.Name, "namespace", pod.Namespace, "bindPhase", bindPhase) | ||
podDev, _ := util.DecodePodDevices(util.SupportDevices, pod.Annotations) | ||
s.addPod(pod, nodeID, podDev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about why the addPod function is always called, regardless of whether the pod matches the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Shouren Even for successfully bound Pods, we still need to call addPod
to ensure that the Pod and its device usage are correctly tracked in the scheduler's internal state. This is important for resource accounting and subsequent Pod scheduling decisions.
The key difference is:
- By checking the
DeviceBindPhase
flag, we avoid duplicate scheduling processing - But
addPod
is still needed to update the scheduler's internal state and resource tracking - This ensures that resources are allocated correctly while avoiding duplicate processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
podDev, _ := util.DecodePodDevices(util.SupportDevices, pod.Annotations)
s.addPod(pod, nodeID, podDev)
return
@Kevinz857 Since addPod
is still need to be called, can we simplify it by removing those lines of code ?
This commit adds unit tests for the skip processing logic added in PR Project-HAMi#1104 to fix issue Project-HAMi#987, where pods with successful device binding were not properly identified, causing scheduler to reprocess them unnecessarily. The test verifies that: 1. Pods marked with DeviceBindPhase=success are identified correctly 2. Both regular and successfully bound pods are added for resource tracking 3. The appropriate path is taken for bound pods to prevent duplicate processing 4. Both types of pods are properly registered in the pod manager Signed-off-by: Kevin <[email protected]> Signed-off-by: Kevinz857 <[email protected]>
if bindPhase, exists := pod.Annotations[util.DeviceBindPhase]; exists && bindPhase == util.DeviceBindSuccess { | ||
klog.V(5).InfoS("Skipping successfully bound pod to prevent scheduler confusion", "pod", pod.Name, "namespace", pod.Namespace, "bindPhase", bindPhase) | ||
podDev, _ := util.DecodePodDevices(util.SupportDevices, pod.Annotations) | ||
s.addPod(pod, nodeID, podDev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
podDev, _ := util.DecodePodDevices(util.SupportDevices, pod.Annotations)
s.addPod(pod, nodeID, podDev)
return
@Kevinz857 Since addPod
is still need to be called, can we simplify it by removing those lines of code ?
klog.V(5).Infof("Clearing to-allocate annotations for successfully bound pod %s/%s", pod.Namespace, pod.Name) | ||
for _, toAllocateKey := range util.InRequestDevices { | ||
// Set to empty string to remove the annotation | ||
newAnnos[toAllocateKey] = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotations of a Pod after a successful device allocation would be like this in my local cluster:
apiVersion: v1
kind: Pod
metadata:
annotations:
hami.io/bind-phase: success
hami.io/bind-time: "1749202603"
hami.io/vgpu-devices-allocated: GPU-cf25b1b9-0695-4853-b322-61f8dd89ba1b,NVIDIA,81920,0:;
hami.io/vgpu-devices-to-allocate: ;
@Kevinz857 I am not sure if setting hami.io/vgpu-devices-to-allocate
in annotations to an empty string will break the default behavior or not.
PR Description
Brief Description
Fix issue #987 where pods with successfully bound devices retain
hami.io/vgpu-devices-to-allocate
annotations, causing scheduler confusion and Kubernetes 1.20 compatibility issues.Problem
hami.io/vgpu-devices-to-allocate
annotations after bindingRoot Cause:
The
hami.io/vgpu-devices-to-allocate
annotations are set during scheduling but never cleared after successful binding, causing Kubernetes 1.20 scheduler to treat these pods as unscheduled.Solution
Clear to-allocate annotations on successful binding
updatePodAnnotationsAndReleaseLock()
inpkg/device/devices.go
util.InRequestDevices
annotations whendeviceBindPhase == util.DeviceBindSuccess
Add scheduler protection
onAddPod()
inpkg/scheduler/scheduler.go
bind-phase: success
to prevent redundant operationsUpdate tests
Test_PodAllocationTrySuccess
to verify annotation clearing behaviorTesting
go test ./pkg/device/ -v
Expected behavior after fix:
hami.io/vgpu-devices-allocated
andhami.io/bind-phase: success
hami.io/vgpu-devices-to-allocate
annotationsType of Change
Files Changed
pkg/device/devices.go
- Clear to-allocate annotations on successful bindingpkg/scheduler/scheduler.go
- Skip processing successfully bound podspkg/device/devices_test.go
- Update tests to verify new behaviorChecklist
Related Issues
Fixes #987