-
Notifications
You must be signed in to change notification settings - Fork 315
Description
For a ParallelCluster (latest version) configured with multiple subnets defined for compute queues, what is the expected node-to-subnet allocation behavior?
1. Does the order of subnets matter?
• For example, if I define subnets in az1, az2, and az3, will new compute nodes always launch in az1 first unless that AZ is out of capacity/IP addresses?
2. If an AZ (e.g., az1) is out of capacity for the requested instance type, will ParallelCluster automatically retry launching in the next subnet (az2 or az3)?
3. Are there limitations or constraints to be aware of?
• How many subnets across different AZs can be defined?
• Will it retry across all available subnets until one has capacity, or stop after one or two attempts?
⸻
Context
This question is in response to frequent instance capacity issues in a particular region. I’m trying to understand whether defining more subnets across multiple AZs can help alleviate these availability problems and if there are limits on it to b consider (adding subnets/AZs across 8 AZs would work so that it will try all 8 till it finds where instance can be created?)