Skip to content

MCAD CPU Preemption Test is failing intermittently in e2e #684

Open
@VanillaSpoon

Description

@VanillaSpoon

Describe the Bug

The MCAD CPU Preemption Test is failing intermittently in CI and local e2e tests.
It appears to fail here:

err = waitAWAnyPodsExists(context, aw2)
		// With improved accounting, no pods will be spawned
		Expect(err).To(HaveOccurred())

Returning nil, instead of an err: Expected an error to have occurred. Got: <nil>: nil

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

MCAD

Expected Behavior:

The expectation is that aw2 would not be able to be scheduled due to insufficient CPU resources, leading to an error that is caught by the assertion.

Logs & Failures:

Logs show that aw2 AppWrapper appears to get running successfully, contrary to our expectations. This behavior suggests a discrepancy in the resources during the test runs.

[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-656mw in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[podPhase] Pod aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq in phase: Running not part of AppWrapper: aw-deployment-2-550cpu-i21vda, labels: map[string]string{"app":"aw-deployment-2-426cpu-0l7bx6", "appwrapper.mcad.ibm.com":"aw-deployment-2-426cpu-0l7bx6", "pod-template-hash":"645b9686c", "resourceName":"aw-deployment-2-426cpu-0l7bx6"}
[cleanupTestObjects] Deleting AW aw-deployment-2-426cpu-0l7bx6.
[cleanupTestObjects] Awaiting pod test/aw-deployment-2-426cpu-0l7bx6-645b9686c-656mw to be deleted for AW aw-deployment-2-426cpu-0l7bx6.
[cleanupTestObjects] Awaiting pod test/aw-deployment-2-426cpu-0l7bx6-645b9686c-d99pq to be deleted for AW aw-deployment-2-426cpu-0l7bx6.
• Failure [6.211 seconds]
AppWrapper E2E Test
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:33
  MCAD CPU Preemption Test [It]
  /home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:97

  Expected an error to have occurred.  Got:
      <nil>: nil

  /home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:116
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSS

Summarizing 1 Failure:

[Fail] AppWrapper E2E Test [It] MCAD CPU Preemption Test 
/home/runner/work/multi-cluster-app-dispatcher/multi-cluster-app-dispatcher/test/e2e/queue.go:116

Ran 2 of 30 Specs in 11.217 seconds
FAIL! -- 1 Passed | 1 Failed | 0 Pending | 28 Skipped
--- FAIL: TestE2E (11.22s)
FAIL
FAIL	github.com/project-codeflare/multi-cluster-app-dispatcher/test/e2e	11.229s
FAIL
End to end test script return code set to 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    • Status

      No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions