Skip to content

[Feature Request] Making Shard Limit Validator and allocation decider index tier agnostic #19610

@Gagan6164

Description

@Gagan6164

Is your feature request related to a problem? Please describe

Problem Statement

Currently, OpenSearch validates shard limits when performing operations that increase the total number of shards in the cluster, such as:

  • Creating a new index
  • Reopening a closed index
  • Changing the number of replicas

These validations check against the cluster-wide shard limits defined by:

  • cluster.max_shards_per_node
  • cluster.routing.allocation.total_shards_per_node

If an operation would exceed these limits, it is rejected.

Current Limitation

The existing validation logic does not differentiate between hot and warm indices/nodes when calculating shard limits. This creates two key problems:

  1. Inaccurate limit calculations: The validation may incorrectly reject operations because it doesn't account for dedicated warm nodes that could accommodate additional shards from warm indices.

  2. Resource underutilization: Warm nodes typically have higher shard capacity than hot nodes. Applying the same shard limits across both tiers leads to inefficient resource usage, preventing warm nodes from being utilized to their full potential.

Impact

This limitation can cause:

  • False rejections of legitimate operations when warm node capacity is available
  • Suboptimal cluster resource utilization
  • Operational friction in remote capable architectures where warm nodes are provisioned to handle higher shard counts

Describe the solution you'd like

Proposed Solution

Introduce tier-aware shard limit validation by adding dedicated settings for warm (remote-capable) indices and nodes, parallel to the existing hot index settings.

New Cluster Settings

Add the following cluster-level settings to control shard limits for warm/remote-capable indices:

  1. cluster.max_remote_capable_shards_per_node

    • Maximum number of remote-capable shards allowed per warm node
    • Parallel to the existing cluster.max_shards_per_node for hot indices
  2. cluster.routing.allocation.total_remote_capable_shards_limit

    • Total cluster-wide limit for remote-capable shards
    • Parallel to the existing cluster.routing.allocation.total_shards_per_node
  3. cluster.routing.allocation.total_remote_capable_shards_per_node

    • Cluster-level setting for total remote-capable shards per node
    • Used in allocation decisions

New Index Settings

Extend ShardLimitAllocationDecider with a new index-level setting:

  • index.routing.allocation.total_remote_capable_shards_per_node
    • Per-index override for shard allocation limits on warm nodes
    • Parallel to existing hot index shard settings

Updated Validation Logic

When validating operations for warm indices:

  1. Use warm-specific settings instead of hot index settings for shard limit calculations
  2. Count only warm nodes (remote-capable nodes) instead of all data nodes
  3. Count only open remote-capable shards instead of all open shards
  4. Apply warm-specific limits defined by the new settings

Behavior

  • Hot indices: Continue using existing settings (cluster.max_shards_per_node, etc.)
  • Warm indices: Use the new remote-capable settings for validation and allocation decisions
  • This allows operators to configure different, typically higher, shard limits for warm nodes that are provisioned to handle larger shard counts

Related component

Search:Searchable Snapshots

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions