Skip to content

ComputeDomain kubelet plugin: kubelet restart may permanently delete resource slice #330

@jgehrcke

Description

@jgehrcke

Before:

$ kubectl get resourceslices.resource.k8s.io 
NAME                                                  NODE                  DRIVER                      POOL                  AGE
gb-nvl-043-bianca-1-compute-domain.nvidia.com-cxknf   gb-nvl-043-bianca-1   compute-domain.nvidia.com   gb-nvl-043-bianca-1   12h
gb-nvl-043-bianca-2-compute-domain.nvidia.com-spqlr   gb-nvl-043-bianca-2   compute-domain.nvidia.com   gb-nvl-043-bianca-2   27m
gb-nvl-043-bianca-3-compute-domain.nvidia.com-99rtp   gb-nvl-043-bianca-3   compute-domain.nvidia.com   gb-nvl-043-bianca-3   12h
gb-nvl-043-bianca-4-compute-domain.nvidia.com-hzbqb   gb-nvl-043-bianca-4   compute-domain.nvidia.com   gb-nvl-043-bianca-4   25m
gb-nvl-043-bianca09-compute-domain.nvidia.com-tctnx   gb-nvl-043-bianca09   compute-domain.nvidia.com   gb-nvl-043-bianca09   12h

Then, on bianca-2:

$ sudo systemctl stop kubelet.service
$ sudo systemctl start kubelet.service

After:

$ kubectl get resourceslices.resource.k8s.io 
NAME                                                  NODE                  DRIVER                      POOL                  AGE
gb-nvl-043-bianca-1-compute-domain.nvidia.com-cxknf   gb-nvl-043-bianca-1   compute-domain.nvidia.com   gb-nvl-043-bianca-1   12h
gb-nvl-043-bianca-3-compute-domain.nvidia.com-99rtp   gb-nvl-043-bianca-3   compute-domain.nvidia.com   gb-nvl-043-bianca-3   12h
gb-nvl-043-bianca-4-compute-domain.nvidia.com-hzbqb   gb-nvl-043-bianca-4   compute-domain.nvidia.com   gb-nvl-043-bianca-4   26m
gb-nvl-043-bianca09-compute-domain.nvidia.com-tctnx   gb-nvl-043-bianca09   compute-domain.nvidia.com   gb-nvl-043-bianca09   12h

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions