Skip to content

Enable DRAExtendedResource feature gate and extres test in Lambda CI#1027

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
dims:worktree-extres-feature-gate
Apr 13, 2026
Merged

Enable DRAExtendedResource feature gate and extres test in Lambda CI#1027
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
dims:worktree-extres-feature-gate

Conversation

@dims
Copy link
Copy Markdown
Member

@dims dims commented Apr 12, 2026

Detect the Kubernetes version after downloading binaries. When k8s >= 1.35, pass KUBEADM_FEATURE_GATES=DRAExtendedResource=true to setup-k8s-node.sh so the API server, scheduler, controller-manager, and kubelet all enable the Alpha DRAExtendedResource feature gate.

Add test_gpu_extres.bats to the tests-gpu-single target. The test already self-skips when:

  • k8s < 1.35, or
  • DRAExtendedResource=true is not found in the API server pod spec

Requires companion test-infra PR to land first:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 12, 2026
@k8s-ci-robot k8s-ci-robot requested a review from shivamerla April 12, 2026 12:31
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 12, 2026
@dims dims force-pushed the worktree-extres-feature-gate branch from 56439b3 to d87b0ad Compare April 12, 2026 14:00
@dims
Copy link
Copy Markdown
Member Author

dims commented Apr 12, 2026

/assign @shivamerla

@shivamerla
Copy link
Copy Markdown
Contributor

@dims is it possible to run all of these tests at once and collate all commits in the single MR. We can keep them in a separate branch until then before merging to main.

@dims
Copy link
Copy Markdown
Member Author

dims commented Apr 12, 2026

@dims is it possible to run all of these tests at once and collate all commits in the single MR. We can keep them in a separate branch until then before merging to main.

I am planning to wrap up what i am doing as much as possible today. Let's change tactics if this spills over to the work week and gets in the teams way.

Detect the Kubernetes version after downloading binaries. When k8s >= 1.35,
pass KUBEADM_FEATURE_GATES=DRAExtendedResource=true to setup-k8s-node.sh
so the API server, scheduler, controller-manager, and kubelet all enable
the Alpha feature gate.

Add test_gpu_extres.bats to the tests-gpu-single target. The test already
self-skips when the gate is absent or k8s < 1.35.

Also fix two pre-existing test issues discovered during validation:

- test_gpu_extres.bats: add DISABLE_COMPUTE_DOMAINS handling in setup_file,
  matching all other test files. Without this, chart upgrade enables compute
  domains on non-NVSwitch GPUs, crashing the compute-domains container.

- test_gpu_robustness.bats: make nvidia_dra_requests_total assertion
  conditional. This counter is only registered after the first DRA request;
  it does not appear in the metrics output before any GPU pod has run.

Requires a companion test-infra PR to teach setup-k8s-node.sh to accept
KUBEADM_FEATURE_GATES and generate a kubeadm config file with the gates
applied to all control plane components.

Tested: 15/15 tests pass on Lambda gpu_1x_a10 with k8s v1.35.3.
@dims dims force-pushed the worktree-extres-feature-gate branch from d87b0ad to 4ca0585 Compare April 12, 2026 21:15
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 12, 2026
@dims
Copy link
Copy Markdown
Member Author

dims commented Apr 12, 2026

/test pull-dra-driver-nvidia-gpu-e2e-lambda-gpu

@shivamerla
Copy link
Copy Markdown
Contributor

As discussed offline, we can revisit metrics initialization separately.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 12, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, shivamerla

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 5d2b054 into kubernetes-sigs:main Apr 13, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants