Skip to content

RHOAIENG-63118 - Add upgrade tests for Kueue resource management#939

Open
sukumars321 wants to merge 1 commit into
opendatahub-io:mainfrom
sukumars321:RHOAIENG-63118
Open

RHOAIENG-63118 - Add upgrade tests for Kueue resource management#939
sukumars321 wants to merge 1 commit into
opendatahub-io:mainfrom
sukumars321:RHOAIENG-63118

Conversation

@sukumars321

@sukumars321 sukumars321 commented Jul 1, 2026

Copy link
Copy Markdown

Description

This commit introduces new tests for validating the upgrade process of Kueue resource flavors and cluster queues. The tests ensure that resource specifications and generation numbers are correctly maintained during upgrades. Key functions include AddUpgradeResourceBaseline, StoreUpgradeBaseline, and VerifyUpgradeResourceSpecIntegrity, which handle the storage and verification of resource specifications in ConfigMaps.

Additionally, the tests set up necessary Kueue resources and validate their state before and after the upgrade process, ensuring the integrity of the resource management system.

This code change addresses Task 8 of [RHOAIENG-63117] Kueue Upgrade Test Coverage Audit Report

JIRA : https://redhat.atlassian.net/browse/RHOAIENG-63118

How Has This Been Tested?

Manually verified the changes against a test cluster

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • New Features

    • Added end-to-end GPU upgrade validation for Kueue-backed PyTorch jobs.
    • Added shared test support for saving and checking upgrade baselines.
    • Added checks that GPU quota and scheduling behavior remain correct after upgrade.
  • Bug Fixes

    • Improved validation of workload and resource settings across upgrade steps.
    • Added safer test setup and cleanup to reduce flaky upgrade test runs.

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kramaranya for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from pawelpaszki and sutaakar July 1, 2026 10:36
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@sukumars321, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 52 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2fd1e1d9-6ed4-4131-b6ab-b2467e7943e5

📥 Commits

Reviewing files that changed from the base of the PR and between ff2c1df and 3d45d40.

📒 Files selected for processing (3)
  • tests/common/support/kueue_upgrade.go
  • tests/common/support/kueue_upgrade_test.go
  • tests/kfto/kfto_kueue_gpu_upgrade_test.go
📝 Walkthrough

Walkthrough

CWE-284 (Improper Access Control) note: this PR adds no RBAC/auth checks for ConfigMap read/write in the new tests/common/support/kueue_upgrade.go — it's test tooling, but flag if reused in production paths. Adds three new test files: shared Kueue upgrade helper functions (baseline serialization into ConfigMap, generation/spec integrity verification, GPU nominal quota extraction), corresponding unit tests, and a new end-to-end test (kfto_kueue_gpu_upgrade_test.go) exercising ResourceFlavor/ClusterQueue/LocalQueue/PyTorchJob state across a simulated upgrade, including Hold→None stop-policy transition and GPU-tolerant PyTorchJob creation/deletion.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Related issues: None referenced in provided diff.

Related PRs: None referenced in provided diff.

Suggested labels: test, kueue, gpu, e2e

Suggested reviewers: No reviewer metadata provided in raw summary — CWE-940 (Improper Verification of Source of a Communication Channel) is not applicable here, but I cannot fabricate reviewer identities.

Poem:
No rabbit hops through CI blind —
ConfigMaps stored, baselines signed.
GPU quotas checked twice, no less,
Hold to None — a clean success.
Audit the map deletes, I insist. 🐇

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding upgrade tests for Kueue resource management.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Contribution Quality And Spam Detection ✅ Passed Only a templated PR body is mildly suspicious; the change is a concrete multi-file test addition, not a spammy batch or security-theater fix.
No Hardcoded Secrets ✅ Passed No hardcoded credentials, secrets, embedded-credential URLs, or long base64 blobs found in the added files; nothing implicates CWE-798/CWE-259.
No Weak Cryptography ✅ Passed No CWE-327/CWE-208 issues: the added code has no crypto primitives and only compares generations/resource quotas, not secrets or HMACs.
No Injection Vectors ✅ Passed No CWE-89/78/94/502/79 sink found; new Go helpers/tests only use hardcoded or typed inputs, and the PR is in test code.
No Privileged Containers ✅ Passed No privileged flags, hostPID/Network/IPC, SYS_ADMIN, allowPrivilegeEscalation:true, or root user settings in changed files. Test-only pods; no CWE-269/276 issue.
No Sensitive Data In Logs ✅ Passed No CWE-532 issue found; logs only resource names and Kueue specs, with no passwords, tokens, PII, or request/response bodies.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
tests/common/support/kueue_upgrade.go (1)

54-56: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

Delete error silently swallowed.

StoreUpgradeBaseline ignores the error from Delete (Line 54). If deletion fails for a reason other than "not found" (e.g. RBAC denial), the subsequent Create will fail with an unhelpful AlreadyExists/permission error, masking the root cause.

♻️ Suggested fix
-	_ = test.Client().Core().CoreV1().ConfigMaps(namespace).Delete(test.Ctx(), configMapName, metav1.DeleteOptions{})
+	if err := test.Client().Core().CoreV1().ConfigMaps(namespace).Delete(test.Ctx(), configMapName, metav1.DeleteOptions{}); err != nil && !k8serrors.IsNotFound(err) {
+		test.T().Logf("Failed to delete existing baseline ConfigMap %s/%s: %v", namespace, configMapName, err)
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/common/support/kueue_upgrade.go` around lines 54 - 56,
StoreUpgradeBaseline is ignoring the result of ConfigMaps(...).Delete, which can
hide real failures before the create step. Update the delete-and-create flow to
check the Delete error explicitly in StoreUpgradeBaseline, using the existing
test.Client().Core().CoreV1().ConfigMaps(namespace) calls, and fail the test
immediately on any unexpected delete error (while still allowing a not-found
case if intended) before proceeding to Create.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/common/support/kueue_upgrade_test.go`:
- Around line 90-106: The current test only covers the happy path where both
generation and spec match, so it does not expose the missing spec validation in
VerifyUpgradeResourceSpecIntegrity. Update
TestVerifyUpgradeResourceSpecIntegrity to add a mismatched ResourceFlavorSpec
with the same generation and assert it fails, and add a separate case where the
generation value differs to verify the expected failure behavior; use
VerifyUpgradeResourceSpecIntegrity, ResourceFlavorSpec, and the ConfigMap keys
already in the test to keep the coverage targeted.
- Around line 19-28: The import block in kueue_upgrade_test.go is not in the
order expected by openshift-goimports, causing verify-imports to fail. Reformat
the imports in the package for the test file so they are sorted and grouped
consistently with goimports conventions, keeping the existing dependencies but
adjusting their order in the import list.

In `@tests/common/support/kueue_upgrade.go`:
- Around line 60-75: VerifyUpgradeResourceSpecIntegrity only checks generation
and never asserts that the spec content stayed the same. In
VerifyUpgradeResourceSpecIntegrity, compare the pre-upgrade spec from
configMap.Data[specKey] against the post-upgrade spec serialized from spec, and
keep the generation check only as supplemental context. Use the existing
resourceName, genKey, and specKey flow to locate the mismatch, and add a spec
equality assertion so the function actually validates resource spec integrity
during upgrades.

In `@tests/kfto/kfto_kueue_gpu_upgrade_test.go`:
- Around line 155-163: `VerifyUpgradeResourceSpecIntegrity` is only checking the
stored generation and not validating that the spec content matches what was
saved. Update the helper in `kueue_upgrade.go` so it compares the marshalled
`spec` (`json.Marshal(spec)`) against the corresponding
`configMap.Data[specKey]` in addition to the generation check, and keep the
existing call sites like `VerifyUpgradeResourceSpecIntegrity` for
`ResourceFlavor` and `ClusterQueue` unchanged.

---

Nitpick comments:
In `@tests/common/support/kueue_upgrade.go`:
- Around line 54-56: StoreUpgradeBaseline is ignoring the result of
ConfigMaps(...).Delete, which can hide real failures before the create step.
Update the delete-and-create flow to check the Delete error explicitly in
StoreUpgradeBaseline, using the existing
test.Client().Core().CoreV1().ConfigMaps(namespace) calls, and fail the test
immediately on any unexpected delete error (while still allowing a not-found
case if intended) before proceeding to Create.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: e952e18d-938e-4b62-beef-8fbbf9717445

📥 Commits

Reviewing files that changed from the base of the PR and between f5c3de9 and ff2c1df.

📒 Files selected for processing (3)
  • tests/common/support/kueue_upgrade.go
  • tests/common/support/kueue_upgrade_test.go
  • tests/kfto/kfto_kueue_gpu_upgrade_test.go

Comment on lines +19 to +28
import (
"encoding/json"
"testing"

"github.com/onsi/gomega"

corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
kueuev1beta2 "sigs.k8s.io/kueue/apis/kueue/v1beta2"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Imports not sorted — pipeline failing.

verify-imports reports this file fails openshift-goimports sorting.

make imports
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/common/support/kueue_upgrade_test.go` around lines 19 - 28, The import
block in kueue_upgrade_test.go is not in the order expected by
openshift-goimports, causing verify-imports to fail. Reformat the imports in the
package for the test file so they are sorted and grouped consistently with
goimports conventions, keeping the existing dependencies but adjusting their
order in the import list.

Source: Pipeline failures

Comment on lines +90 to +106
func TestVerifyUpgradeResourceSpecIntegrity(t *testing.T) {
test := NewTest(t)
configMap := &corev1.ConfigMap{
Data: map[string]string{
"gen-key": "5",
"spec-key": `{"nodeLabels":{"nvidia.com/gpu.present":"true"}}`,
UpgradeRHOAIVersionKey: "3.4.0",
},
}
spec := kueuev1beta2.ResourceFlavorSpec{
NodeLabels: map[string]string{
"nvidia.com/gpu.present": "true",
},
}

VerifyUpgradeResourceSpecIntegrity(test, "ResourceFlavor", 5, spec, configMap, "gen-key", "spec-key")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Test doesn't cover the actual bug in VerifyUpgradeResourceSpecIntegrity.

This only exercises the case where generation and spec both match, so it passes even though the function under test never checks spec content (see comment on kueue_upgrade.go Lines 60-75). Add a case with a mismatched spec but matching generation to catch the gap, and a case for mismatched generation to verify failure behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/common/support/kueue_upgrade_test.go` around lines 90 - 106, The
current test only covers the happy path where both generation and spec match, so
it does not expose the missing spec validation in
VerifyUpgradeResourceSpecIntegrity. Update
TestVerifyUpgradeResourceSpecIntegrity to add a mismatched ResourceFlavorSpec
with the same generation and assert it fails, and add a separate case where the
generation value differs to verify the expected failure behavior; use
VerifyUpgradeResourceSpecIntegrity, ResourceFlavorSpec, and the ConfigMap keys
already in the test to keep the coverage targeted.

Comment on lines +60 to +75
func VerifyUpgradeResourceSpecIntegrity(test Test, resourceName string, generation int64, spec interface{},
configMap *corev1.ConfigMap, genKey, specKey string) {
test.T().Helper()

expectedGen := configMap.Data[genKey]
actualGen := fmt.Sprintf("%d", generation)
if actualGen != expectedGen {
currentSpecJSON, _ := json.Marshal(spec)
test.T().Logf("%s generation changed during upgrade (%s to %s)", resourceName, expectedGen, actualGen)
test.T().Logf("Pre-upgrade %s spec: %s", resourceName, configMap.Data[specKey])
test.T().Logf("Post-upgrade %s spec: %s", resourceName, currentSpecJSON)
}
test.Expect(actualGen).To(gomega.Equal(expectedGen),
"%s spec should be unchanged after upgrade (generation %s, expected %s)", resourceName, actualGen, expectedGen)
test.T().Logf("%s generation unchanged after upgrade: %s", resourceName, actualGen)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

VerifyUpgradeResourceSpecIntegrity never actually verifies the spec, only the generation.

currentSpecJSON (Line 67) is computed solely for the mismatch log message; the only assertion is on actualGen == expectedGen (Line 72-73). If a resource's spec silently drifts while Kueue leaves the generation unchanged (or the generation happens to coincide), this function reports success despite the spec being wrong — defeating the stated purpose ("Verify... Resource Spec Integrity") and the PR objective of validating that "resource specifications ... are preserved correctly during upgrades."

🐛 Proposed fix to compare spec content
 func VerifyUpgradeResourceSpecIntegrity(test Test, resourceName string, generation int64, spec interface{},
 	configMap *corev1.ConfigMap, genKey, specKey string) {
 	test.T().Helper()
 
 	expectedGen := configMap.Data[genKey]
 	actualGen := fmt.Sprintf("%d", generation)
+	currentSpecJSON, err := json.Marshal(spec)
+	test.Expect(err).NotTo(gomega.HaveOccurred())
 	if actualGen != expectedGen {
-		currentSpecJSON, _ := json.Marshal(spec)
 		test.T().Logf("%s generation changed during upgrade (%s to %s)", resourceName, expectedGen, actualGen)
 		test.T().Logf("Pre-upgrade %s spec: %s", resourceName, configMap.Data[specKey])
 		test.T().Logf("Post-upgrade %s spec: %s", resourceName, currentSpecJSON)
 	}
 	test.Expect(actualGen).To(gomega.Equal(expectedGen),
 		"%s spec should be unchanged after upgrade (generation %s, expected %s)", resourceName, actualGen, expectedGen)
+	test.Expect(string(currentSpecJSON)).To(gomega.MatchJSON(configMap.Data[specKey]),
+		"%s spec content should be unchanged after upgrade", resourceName)
 	test.T().Logf("%s generation unchanged after upgrade: %s", resourceName, actualGen)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func VerifyUpgradeResourceSpecIntegrity(test Test, resourceName string, generation int64, spec interface{},
configMap *corev1.ConfigMap, genKey, specKey string) {
test.T().Helper()
expectedGen := configMap.Data[genKey]
actualGen := fmt.Sprintf("%d", generation)
if actualGen != expectedGen {
currentSpecJSON, _ := json.Marshal(spec)
test.T().Logf("%s generation changed during upgrade (%s to %s)", resourceName, expectedGen, actualGen)
test.T().Logf("Pre-upgrade %s spec: %s", resourceName, configMap.Data[specKey])
test.T().Logf("Post-upgrade %s spec: %s", resourceName, currentSpecJSON)
}
test.Expect(actualGen).To(gomega.Equal(expectedGen),
"%s spec should be unchanged after upgrade (generation %s, expected %s)", resourceName, actualGen, expectedGen)
test.T().Logf("%s generation unchanged after upgrade: %s", resourceName, actualGen)
}
func VerifyUpgradeResourceSpecIntegrity(test Test, resourceName string, generation int64, spec interface{},
configMap *corev1.ConfigMap, genKey, specKey string) {
test.T().Helper()
expectedGen := configMap.Data[genKey]
actualGen := fmt.Sprintf("%d", generation)
currentSpecJSON, err := json.Marshal(spec)
test.Expect(err).NotTo(gomega.HaveOccurred())
if actualGen != expectedGen {
test.T().Logf("%s generation changed during upgrade (%s to %s)", resourceName, expectedGen, actualGen)
test.T().Logf("Pre-upgrade %s spec: %s", resourceName, configMap.Data[specKey])
test.T().Logf("Post-upgrade %s spec: %s", resourceName, currentSpecJSON)
}
test.Expect(actualGen).To(gomega.Equal(expectedGen),
"%s spec should be unchanged after upgrade (generation %s, expected %s)", resourceName, actualGen, expectedGen)
test.Expect(string(currentSpecJSON)).To(gomega.MatchJSON(configMap.Data[specKey]),
"%s spec content should be unchanged after upgrade", resourceName)
test.T().Logf("%s generation unchanged after upgrade: %s", resourceName, actualGen)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/common/support/kueue_upgrade.go` around lines 60 - 75,
VerifyUpgradeResourceSpecIntegrity only checks generation and never asserts that
the spec content stayed the same. In VerifyUpgradeResourceSpecIntegrity, compare
the pre-upgrade spec from configMap.Data[specKey] against the post-upgrade spec
serialized from spec, and keep the generation check only as supplemental
context. Use the existing resourceName, genKey, and specKey flow to locate the
mismatch, and add a spec equality assertion so the function actually validates
resource spec integrity during upgrades.

Comment thread tests/kfto/kfto_kueue_gpu_upgrade_test.go
@sutaakar

sutaakar commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@sukumars321 is there a specific reason to test using GPU?
Other upgrade tests are CPU only.

This commit introduces new tests for validating the upgrade process of Kueue resource flavors and cluster queues. The tests ensure that resource specifications and generation numbers are correctly maintained during upgrades. Key functions include `AddUpgradeResourceBaseline`, `StoreUpgradeBaseline`, and `VerifyUpgradeResourceSpecIntegrity`, which handle the storage and verification of resource specifications in ConfigMaps.

Additionally, the tests set up necessary Kueue resources and validate their state before and after the upgrade process, ensuring the integrity of the resource management system.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Sukumar Subramani <suksubra@redhat.com>
@sukumars321

Copy link
Copy Markdown
Author

@sukumars321 is there a specific reason to test using GPU? Other upgrade tests are CPU only.

@sutaakar This is part of the Kueue pre/post upgrade coverage https://redhat.atlassian.net/browse/RHOAIENG-63118, This addresses Task 8 of [RHOAIENG-63117] Kueue Upgrade Test Coverage Audit Report

@chambridge chambridge left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted a couple items upon my review, but I'm not super familar with this code base. I also see several coderabbit.ai code review comments that need resolving if applicable.

Comment on lines +144 to +147
defer test.Client().Kueue().KueueV1beta2().ResourceFlavors().Delete(test.Ctx(), gpuUpgradeResourceFlavorName, metav1.DeleteOptions{})
defer test.Client().Kueue().KueueV1beta2().ClusterQueues().Delete(test.Ctx(), gpuUpgradeClusterQueueName, metav1.DeleteOptions{})
defer test.Client().Core().CoreV1().ConfigMaps(gpuUpgradeNamespaceName).Delete(test.Ctx(), gpuUpgradeBaselineConfigMap, metav1.DeleteOptions{})
defer DeleteTestNamespace(test, namespace)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer test.Client().Kueue().KueueV1beta2().ResourceFlavors().Delete(...)  // runs 4th (LIFO)
defer test.Client().Kueue().KueueV1beta2().ClusterQueues().Delete(...)    // runs 3rd
defer test.Client().Core().CoreV1().ConfigMaps(...).Delete(...)           // runs 2nd
defer DeleteTestNamespace(test, namespace)                                // runs 1st

Go defers execute LIFO. DeleteTestNamespace runs first, which deletes the namespace and all its contents (including the ConfigMap). The explicit ConfigMap delete on line 146 will then fail with NotFound. It's harmless but misleading.

The MNIST upgrade test (kfto_kueue_mnist_upgrade_training_test.go) does NOT explicitly clean ConfigMaps — it relies on namespace deletion. Pick one pattern and be consistent.


const UpgradeRHOAIVersionKey = "rhoai-version"

func AddUpgradeResourceBaseline(data map[string]string, genKey, specKey string, generation int64, spec interface{}) error {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function returns an error rather than asserting, so error attribution in test output will point to the helper rather than the caller. StoreUpgradeBaseline and VerifyUpgradeResourceSpecIntegrity correctly call test.T().Helper().

StoreUpgradeBaseline(test, gpuUpgradeNamespaceName, gpuUpgradeBaselineConfigMap, data)
}

func TestRunGpuKueueUpgrade(t *testing.T) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestRunGpuKueueUpgrade verifies ResourceFlavor and ClusterQueue spec integrity post-upgrade but never checks that the LocalQueue survived. Add:

_, err = test.Client().Kueue().KueueV1beta2().LocalQueues(gpuUpgradeNamespaceName).Get(
    test.Ctx(), gpuUpgradeLocalQueueName, metav1.GetOptions{})
test.Expect(err).NotTo(HaveOccurred(), "LocalQueue should exist after upgrade")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants