Skip to content

MCAD CPU Preemption Test is failing intermittently in e2e #684

Open
@VanillaSpoon

Description

@VanillaSpoon

Describe the Bug

The MCAD CPU Preemption Test is failing intermittently in CI and local e2e tests.
It appears to fail here:

err = waitAWAnyPodsExists(context, aw2)
		// With improved accounting, no pods will be spawned
		Expect(err).To(HaveOccurred())

Returning nil, instead of an err: Expected an error to have occurred. Got: <nil>: nil

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

MCAD

Expected Behavior:

The expectation is that aw2 would not be able to be scheduled due to insufficient CPU resources, leading to an error that is caught by the assertion.

Logs & Failures:

Logs show that aw2 AppWrapper appears to get running successfully, contrary to our expectations. This behavior suggests a discrepancy in the resources during the test runs.

[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-656mw in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[cleanupTestObjects] Deleting AW aw-deployment-2-426cpu-0l7bx6.
[cleanupTestObjects] Awaiting pod test/aw-deployment-2-426cpu-0l7bx6-645b9686c-656mw to be deleted for AW aw-deployment-2-426cpu-0l7bx6.
[cleanupTestObjects] Awaiting pod test/aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq to be deleted for AW aw-deployment-2-426cpu-0l7bx6.
• Failure [6.211 seconds]
AppWrapper E2E Test
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:33
  MCAD CPU Preemption Test [It]
  /home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:97

  Expected an error to have occurred.  Got:
      <nil>: nil

  /home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:116
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSS

Summarizing 1 Failure:

[Fail] AppWrapper E2E Test [It] MCAD CPU Preemption Test 
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:116

Ran 2 of 30 Specs in 11.217 seconds
FAIL! -- 1 Passed | 1 Failed | 0 Pending | 28 Skipped
--- FAIL: TestE2E (11.22s)
FAIL
FAIL	github.com/project-codeflare/multi-cluster-app-dispatcher/test/e2e	11.229s
FAIL
End to end test script return code set to 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions