Skip to content

Conversation

@sohankunkerkar
Copy link
Member

This change adds end-to-end tests for the DRA feature, validating that Kueue correctly manages workloads using DRA.

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Part of #8243 and we need this before pushing the Kueue + DRA feature to Beta.

Does this PR introduce a user-facing change?

None

Copilot AI review requested due to automatic review settings December 25, 2025 00:02
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Dec 25, 2025
@netlify
Copy link

netlify bot commented Dec 25, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 4bff381
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/69605de451179000082fc811

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 25, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive end-to-end tests for the Dynamic Resource Allocation (DRA) feature in Kueue, validating that workloads using DRA resources are correctly managed. The changes include test infrastructure setup, DRA driver installation scripts, multiple test scenarios covering quota management, and helper functions for creating jobs with DRA resource claims.

Key Changes

  • Adds E2E test suite for DRA with 5 test scenarios covering admission, quota enforcement, and multi-pod jobs
  • Implements infrastructure to build and deploy the DRA example driver from source in the test environment
  • Adds configuration for enabling DRA feature gates in Kind clusters and Kueue controllers

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/e2e/dra/suite_test.go Test suite setup with DRA driver availability checks
test/e2e/dra/dra_test.go Five comprehensive test scenarios validating DRA resource management
test/e2e/config/dra/kustomization.yaml Kustomization configuration for DRA E2E environment
test/e2e/config/dra/controller_manager_config.yaml Controller configuration enabling DRA feature gate and device class mappings
test/e2e/config/dra/manager_e2e_patch.yaml Image pull policy patch for E2E testing
pkg/util/testingjobs/job/wrappers.go Helper function to add DRA resource claim templates to test jobs
hack/e2e-common.sh Installation and configuration functions for DRA example driver
Makefile-test.mk Make target for running DRA E2E tests
Makefile-deps.mk DRA example driver version configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sohankunkerkar sohankunkerkar force-pushed the dra-e2e branch 2 times, most recently from 83263a0 to 8e6e90e Compare December 25, 2025 00:46
@sohankunkerkar sohankunkerkar changed the title [WIP] Add E2E tests for DRA Add E2E tests for DRA Jan 5, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2026
@sohankunkerkar sohankunkerkar force-pushed the dra-e2e branch 4 times, most recently from 6b70be0 to a8a46d3 Compare January 6, 2026 23:53
dra_driver_temp_dir=$(mktemp -d)
# shellcheck disable=SC2064 # Intentionally expand now to capture the temp dir path
trap "rm -rf '$dra_driver_temp_dir'" RETURN
git clone --depth 1 --branch main "${DRA_EXAMPLE_DRIVER_REPO}" "$dra_driver_temp_dir"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the main branch so it can potentially break our builds, also releases, if there is a bug upstream. I think it would be better to somehow depend on the released version, and ideally if this gets managed by dependabot.

cc @sohankunkerkar @mbobrovskyi is there a way to achieve it?

I'm ok temporarily to use "main" branch, but I would like to see a proper path forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @sohankunkerkar @mbobrovskyi is there a way to achieve it?

I think so. The repository https://github.com/kubernetes-sigs/dra-example-driver.git
is in Go, so we could probably add it as a dependency in hack/internal/tools. That way, we can bump it with each release.

I'm ok temporarily to use "main" branch, but I would like to see a proper path forward.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the main branch so it can potentially break our builds, also releases, if there is a bug upstream. I think it would be better to somehow depend on the released version, and ideally if this gets managed by dependabot.

I can create an issue in https://github.com/kubernetes-sigs/dra-example-driver.git to ask for the new release.
I've not see any activity for the last 3 months so I'm not sure if the repo is actively managed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, please do then. It would be better to test released version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can switch to the release version when we get there. Are we going to block this PR until then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can switch to the release version when we get there. Are we going to block this PR until then?

Thank you, no need to block on that since there is already consensus, and it should be done in coming days.

Let's merge as is so that we can in parallel:

  1. start running the new tests in another CI job
  2. update to released version when that is ready

@mimowo
Copy link
Contributor

mimowo commented Jan 8, 2026

/lgtm
/approve
Thank you 👍

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2026
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: 216035182a1ddefba2f6ff85c9c6ce443bd10afe

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2026
@mimowo
Copy link
Contributor

mimowo commented Jan 8, 2026

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 8, 2026
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2026
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot requested a review from mimowo January 8, 2026 19:58
@sohankunkerkar
Copy link
Member Author

/hold as https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_kueue/8421/pull-kueue-test-e2e-main-1-34/2009311567799128064 looks relatwd. Lest make sure the tests are deflaked first

Actually the tests run with ParallelTotal: 2, and both processes were racing to create cluster-scoped resources (dra-flavor, dra-cq) with fixed names, causing "already exists" errors when they started simultaneously. I fixed that part in the recent push.

This change adds end-to-end tests for the DRA feature,
validating that Kueue correctly manages workloads using DRA.

Signed-off-by: Sohan Kunkerkar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants