Skip to content

Conversation

@tandonks
Copy link
Collaborator

@tandonks tandonks commented Nov 25, 2025

Description

This PR implements multi-tier rollup functionality in OpenSearch Index Management, enabling hierarchical data aggregations where rollup indices can serve as source indices for subsequent rollup operations. This allows for progressive data summarization (e.g., raw data → 1-minute → 5-minute → 10-minute intervals) within a single ISM policy or through chained rollup jobs.

Previously, the ISM Rollup action was a terminal operation that only supported rolling up raw indices. This limitation forced users to rely on complex external automation chains (Raw Data → Rollup → _reindex → Second Rollup Job) or redundantly roll up raw data multiple times at different granularities, leading to:

  • Operational overhead maintaining chained reindex and rollup jobs
  • Compute inefficiency from reprocessing the same raw data repeatedly
  • Lack of automation in retention policies
  • Data management complexity and inconsistency risks

Solution Overview

This implementation extends the rollup framework to support multi-level rollups by:

  1. Smart Initial Timestamp Computation: Automatically detects rollup indices and fetches the earliest timestamp from the date_histogram field instead of scanning raw documents
  2. Consistent Metric and Field Naming: Maps user-specified raw field names to aggregated field names in source rollup indices (e.g., passenger_countpassenger_count.sum)
  3. Source Index Field Support: Added optional source_index field to ISMRollup schema, enabling explicit source specification for multi-tier chaining
  4. Template Variable Resolution: Supports Mustache templates ({{ctx.index}}, {{ctx.source_index}}) for dynamic index naming in ISM policies
  5. Interval Compatibility Enforcement: Validates that target intervals are exact multiples of source intervals (e.g., 1m → 5m → 10m)
  6. Metric Availability Validation: Ensures target rollups only request metrics available in source rollups

Limitations and Best Practices

Source Index Lifecycle Management

When using multi-tier rollups within a single ISM policy, the source index (raw data or intermediate rollup) remains under the same policy's control throughout the entire lifecycle. If you need to delete or manage intermediate rollup indices independently before the policy completes, consider these approaches:

  1. Separate Policies with Index Templates: Create independent ISM policies for each rollup tier and use index templates to automatically attach them:
    {
      "index_patterns": ["my_index_rollup_1m-*"],
      "template": {
        "settings": {
          "plugins.index_state_management.policy_id": "rollup_1m_to_5m_policy"
        }
      }
    }

This allows each rollup index to have its own retention and deletion schedule independent of the source data.

  1. Chained Rollup Jobs: Use standalone rollup jobs (outside ISM) for multi-tier scenarios where you need fine-grained control over each tier's lifecycle.

  2. Policy Transitions: Design your ISM policy states carefully to ensure rollup operations complete before any deletion actions are triggered on source indices.

Note: Deleting a source rollup index while a downstream rollup job is still processing it will cause the downstream rollup to fail. Always ensure rollup operations complete before removing source indices.

Example Policy


{
  "policy": {
    "description": "Multi-tier rollup: 1m → 5m → 10m",
    "default_state": "rollup_1m",
    "states": [
      {
        "name": "rollup_1m",
        "actions": [
          {
            "rollup": {
              "ism_rollup": {
                "description": "Rollup raw data into 1-minute intervals",
                "target_index": "my_index_rollup_1m-{{ctx.index}}",
                "page_size": 100,
                "dimensions": [
                  {
                    "date_histogram": {
                      "source_field": "timestamp",
                      "fixed_interval": "1m",
                      "timezone": "UTC"
                    }
                  },
                  {
                    "terms": {
                      "source_field": "category"
                    }
                  }
                ],
                "metrics": [
                  {
                    "source_field": "value",
                    "metrics": [
                      { "sum": {} },
                      { "min": {} },
                      { "max": {} },
                      { "value_count": {} },
                      { "avg": {} }
                    ]
                  }
                ]
              }
            }
          }
        ],
        "transitions": [
          {
            "state_name": "rollup_5m"
          }
        ]
      },
      {
        "name": "rollup_5m",
        "actions": [
          {
            "rollup": {
              "ism_rollup": {
                "description": "Rollup 1m data into 5-minute intervals",
                "source_index": "my_index_rollup_1m-{{ctx.index}}",
                "target_index": "my_index_rollup_5m-{{ctx.index}}",
                "page_size": 100,
                "dimensions": [
                  {
                    "date_histogram": {
                      "source_field": "timestamp",
                      "fixed_interval": "5m",
                      "timezone": "UTC"
                    }
                  },
                  {
                    "terms": {
                      "source_field": "category"
                    }
                  }
                ],
                "metrics": [
                  {
                    "source_field": "value",
                    "metrics": [
                      { "sum": {} },
                      { "min": {} },
                      { "max": {} },
                      { "value_count": {} },
                      { "avg": {} }
                    ]
                  }
                ]
              }
            }
          }
        ],
        "transitions": [
          {
            "state_name": "rollup_10m"
          }
        ]
      },
      {
        "name": "rollup_10m",
        "actions": [
          {
            "rollup": {
              "ism_rollup": {
                "description": "Rollup 5m data into 10-minute intervals",
                "source_index": "my_index_rollup_5m-{{ctx.index}}",
                "target_index": "my_index_rollup_10m-{{ctx.index}}",
                "page_size": 100,
                "dimensions": [
                  {
                    "date_histogram": {
                      "source_field": "timestamp",
                      "fixed_interval": "10m",
                      "timezone": "UTC"
                    }
                  },
                  {
                    "terms": {
                      "source_field": "category"
                    }
                  }
                ],
                "metrics": [
                  {
                    "source_field": "value",
                    "metrics": [
                      { "sum": {} },
                      { "min": {} },
                      { "max": {} },
                      { "value_count": {} },
                      { "avg": {} }
                    ]
                  }
                ]
              }
            }
          }
        ],
        "transitions": []
      }
    ]
  }
}

Related Issues

Resolves #1490

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 83.95062% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.31%. Comparing base (4f007f2) to head (c24d642).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...nagement/step/rollup/AttemptCreateRollupJobStep.kt 64.10% 10 Missing and 4 partials ⚠️
...ch/indexmanagement/rollup/RollupMetadataService.kt 76.92% 8 Missing and 1 partial ⚠️
.../opensearch/indexmanagement/rollup/RollupRunner.kt 81.81% 3 Missing and 5 partials ⚠️
...arch/indexmanagement/rollup/RollupMapperService.kt 0.00% 2 Missing and 1 partial ⚠️
...nsearch/indexmanagement/rollup/util/RollupUtils.kt 96.29% 0 Missing and 3 partials ⚠️
...management/rollup/interceptor/RollupInterceptor.kt 81.81% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1533      +/-   ##
==========================================
+ Coverage   76.17%   76.31%   +0.14%     
==========================================
  Files         375      375              
  Lines       17596    17806     +210     
  Branches     2417     2459      +42     
==========================================
+ Hits        13404    13589     +185     
- Misses       2947     2962      +15     
- Partials     1245     1255      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tandonks tandonks force-pushed the multi-tier-rollup-support branch 2 times, most recently from 9a7bc1a to b5e5ad8 Compare December 2, 2025 08:55
dimensionsList.toList()
},
metrics = sin.readList(::RollupMetrics),
sourceIndex = if (sin.version.onOrAfter(Version.V_3_0_0) && sin.readBoolean()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is targetted for version 3.5 right ?

Copy link
Collaborator Author

@tandonks tandonks Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually version 3.5 is yet not available as we are yet to upgrade, hence had kept this as a placeholder

}
}
out.writeCollection(metrics)
if (out.version.onOrAfter(Version.V_3_0_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version check should be 3.5

Copy link
Collaborator Author

@tandonks tandonks Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but actually version 3.5 is yet not available as we are yet to upgrade, hence had kept this as a placeholder

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a TODO comment for now highlighting the required version check upgrade once 3.5 is available

@tandonks tandonks force-pushed the multi-tier-rollup-support branch from c266f5b to 88a3169 Compare December 17, 2025 18:09
@tandonks tandonks force-pushed the multi-tier-rollup-support branch from 37b619f to 0a8d5f3 Compare January 6, 2026 09:08
Signed-off-by: Kshitij Tandon <[email protected]>
soosinha
soosinha previously approved these changes Jan 6, 2026
Signed-off-by: Kshitij Tandon <[email protected]>
@tandonks
Copy link
Collaborator Author

tandonks commented Jan 7, 2026

Thanks @soosinha for reviewing. Merging the PR for now, will update the version check once the 3.5 version upgrade PR is merged.

@tandonks tandonks merged commit db0425d into opensearch-project:main Jan 7, 2026
23 checks passed
@bowenlan-amzn
Copy link
Member

@tandonks I notice the IT added in this PR slow down the ism test suite by around 2m and starting to cause timeout for the checks in other PRs (limit is 20m for now).

Compare the time in the PR before
https://github.com/opensearch-project/index-management/actions/runs/20585460304/job/59120970314
And this PR
https://github.com/opensearch-project/index-management/actions/runs/20757737281/job/59604280531

It seems to be this one src/test/kotlin/org/opensearch/indexmanagement/indexstatemanagement/action/RollupActionIT.kt

Please see if you can improve the time cost there. It's better to keep one happy path as IT and use unit test to cover other scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST] Multi-Tier Rollup Support

3 participants