Skip to content

When deploying to a cluster deployed on three different AZ, there is 1/3 of chance of things working #58

Open
@mbaldessari

Description

@mbaldessari

Scenario:

  1. Openshift cluster with nodes spread across three AZs (e.g. 1a, 1b, 1c)
  2. Run make create-gpu-machineset it will pick the AZ of the first machineset it finds (e.g. 1a)
  3. Run ./pattern.sh make install and the model-download job run on another AZ (e.g. 1c) and the pvc+pv are bound to that AZ
  4. The inference service which runs on the GPU nodes (so 1a) will never be scheduled because the PVC is bound to 1c

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions