Skip to content

Multi-AZ Support: please define expected behavior #6950

@snemir2

Description

@snemir2

For a ParallelCluster (latest version) configured with multiple subnets defined for compute queues, what is the expected node-to-subnet allocation behavior?
1. Does the order of subnets matter?
• For example, if I define subnets in az1, az2, and az3, will new compute nodes always launch in az1 first unless that AZ is out of capacity/IP addresses?
2. If an AZ (e.g., az1) is out of capacity for the requested instance type, will ParallelCluster automatically retry launching in the next subnet (az2 or az3)?
3. Are there limitations or constraints to be aware of?
• How many subnets across different AZs can be defined?
• Will it retry across all available subnets until one has capacity, or stop after one or two attempts?

Context

This question is in response to frequent instance capacity issues in a particular region. I’m trying to understand whether defining more subnets across multiple AZs can help alleviate these availability problems and if there are limits on it to b consider (adding subnets/AZs across 8 AZs would work so that it will try all 8 till it finds where instance can be created?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions