Open
Description
Describe the Bug
The MCAD CPU Preemption Test is failing intermittently in CI and local e2e tests.
It appears to fail here:
err = waitAWAnyPodsExists(context, aw2)
// With improved accounting, no pods will be spawned
Expect(err).To(HaveOccurred())
Returning nil, instead of an err: Expected an error to have occurred. Got: <nil>: nil
Codeflare Stack Component Versions
Please specify the component versions in which you have encountered this bug.
MCAD
Expected Behavior:
The expectation is that aw2
would not be able to be scheduled due to insufficient CPU resources, leading to an error that is caught by the assertion.
Logs & Failures:
Logs show that aw2
AppWrapper appears to get running successfully, contrary to our expectations. This behavior suggests a discrepancy in the resources during the test runs.
[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-656mw in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[cleanupTestObjects] Deleting AW aw-deployment-2-426cpu-0l7bx6.
[cleanupTestObjects] Awaiting pod test/aw-deployment-2-426cpu-0l7bx6-645b9686c-656mw to be deleted for AW aw-deployment-2-426cpu-0l7bx6.
[cleanupTestObjects] Awaiting pod test/aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq to be deleted for AW aw-deployment-2-426cpu-0l7bx6.
• Failure [6.211 seconds]
AppWrapper E2E Test
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:33
MCAD CPU Preemption Test [It]
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:97
Expected an error to have occurred. Got:
<nil>: nil
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:116
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSS
Summarizing 1 Failure:
[Fail] AppWrapper E2E Test [It] MCAD CPU Preemption Test
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:116
Ran 2 of 30 Specs in 11.217 seconds
FAIL! -- 1 Passed | 1 Failed | 0 Pending | 28 Skipped
--- FAIL: TestE2E (11.22s)
FAIL
FAIL github.com/project-codeflare/multi-cluster-app-dispatcher/test/e2e 11.229s
FAIL
End to end test script return code set to 1
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
No status