[Bug]: E2E test should be smarter about GPU saturation

### Contact Details

_No response_

### What happened?

The E2E test on OpenShift failed for #420 . Because the assumption identified below was not actually true at that time and place.

The E2E test suite currently assumes (in the "Same-Node Port Collision Creates New Launcher" test case) that it can use 2 GPUs on the node where the server-requesting Pod is initially assigned. That is simply not always true.

It would be great if the test case could detect that situation and react in some sensible manner. It would be good if the debug dumping steps at the end of the job included one that exposed this situation.

This is really just a special case of the larger problem that GPU availability is dynamic in the shared cluster. Every test step that requires a GPU to be allocated is making an assumption of GPU availability that might not be true.

### Version

main (please specify commit below)

### Branch name

_No response_

### Commit SHA

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: E2E test should be smarter about GPU saturation #422

Contact Details

What happened?

Version

Branch name

Commit SHA

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: E2E test should be smarter about GPU saturation #422

Description

Contact Details

What happened?

Version

Branch name

Commit SHA

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions