-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Is your feature request related to a problem? Please describe
Problem Statement
Currently, OpenSearch validates shard limits when performing operations that increase the total number of shards in the cluster, such as:
- Creating a new index
- Reopening a closed index
- Changing the number of replicas
These validations check against the cluster-wide shard limits defined by:
cluster.max_shards_per_node
cluster.routing.allocation.total_shards_per_node
If an operation would exceed these limits, it is rejected.
Current Limitation
The existing validation logic does not differentiate between hot and warm indices/nodes when calculating shard limits. This creates two key problems:
-
Inaccurate limit calculations: The validation may incorrectly reject operations because it doesn't account for dedicated warm nodes that could accommodate additional shards from warm indices.
-
Resource underutilization: Warm nodes typically have higher shard capacity than hot nodes. Applying the same shard limits across both tiers leads to inefficient resource usage, preventing warm nodes from being utilized to their full potential.
Impact
This limitation can cause:
- False rejections of legitimate operations when warm node capacity is available
- Suboptimal cluster resource utilization
- Operational friction in remote capable architectures where warm nodes are provisioned to handle higher shard counts
Describe the solution you'd like
Proposed Solution
Introduce tier-aware shard limit validation by adding dedicated settings for warm (remote-capable) indices and nodes, parallel to the existing hot index settings.
New Cluster Settings
Add the following cluster-level settings to control shard limits for warm/remote-capable indices:
-
cluster.max_remote_capable_shards_per_node
- Maximum number of remote-capable shards allowed per warm node
- Parallel to the existing
cluster.max_shards_per_node
for hot indices
-
cluster.routing.allocation.total_remote_capable_shards_limit
- Total cluster-wide limit for remote-capable shards
- Parallel to the existing
cluster.routing.allocation.total_shards_per_node
-
cluster.routing.allocation.total_remote_capable_shards_per_node
- Cluster-level setting for total remote-capable shards per node
- Used in allocation decisions
New Index Settings
Extend ShardLimitAllocationDecider
with a new index-level setting:
index.routing.allocation.total_remote_capable_shards_per_node
- Per-index override for shard allocation limits on warm nodes
- Parallel to existing hot index shard settings
Updated Validation Logic
When validating operations for warm indices:
- Use warm-specific settings instead of hot index settings for shard limit calculations
- Count only warm nodes (remote-capable nodes) instead of all data nodes
- Count only open remote-capable shards instead of all open shards
- Apply warm-specific limits defined by the new settings
Behavior
- Hot indices: Continue using existing settings (
cluster.max_shards_per_node
, etc.) - Warm indices: Use the new remote-capable settings for validation and allocation decisions
- This allows operators to configure different, typically higher, shard limits for warm nodes that are provisioned to handle larger shard counts
Related component
Search:Searchable Snapshots
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status