Skip to content

Comments

Add delete_snapshot option to ISM delete action for cleaning up searchable snapshot sources#1565

Open
lawofcycles wants to merge 13 commits intoopensearch-project:mainfrom
lawofcycles:feature/delete-searchable-snapshot
Open

Add delete_snapshot option to ISM delete action for cleaning up searchable snapshot sources#1565
lawofcycles wants to merge 13 commits intoopensearch-project:mainfrom
lawofcycles:feature/delete-searchable-snapshot

Conversation

@lawofcycles
Copy link
Contributor

@lawofcycles lawofcycles commented Jan 4, 2026

Description

This PR adds a new optional delete_snapshot parameter to the ISM delete action. When enabled, it automatically deletes the source snapshot of a searchable snapshot index along with the index itself.

use case
Hot → Warm (searchable snapshot) → Delete lifecycle where both the index and underlying snapshot should be removed to free up storage.

format

"actions": [
    {
        "delete": {
            "delete_snapshot": true
        }
    }
],

behavior

  • If delete_snapshot: true and the index is a searchable snapshot, both the index and its source snapshot are deleted
  • If the snapshot is still used by other indices, snapshot deletion is skipped (index deletion still succeeds)
  • If the index is not a searchable snapshot, only the index is deleted (no error)
  • Default is false for backward compatibility

In addition to unit tests, I manually tested with local build.

Related Issues

Resolves #1476

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…hable snapshot sources

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
@codecov
Copy link

codecov bot commented Jan 4, 2026

Codecov Report

❌ Patch coverage is 86.48649% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.58%. Comparing base (05c26b8) to head (234f71f).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...exstatemanagement/step/delete/AttemptDeleteStep.kt 85.71% 3 Missing and 5 partials ⚠️
.../indexstatemanagement/action/DeleteActionParser.kt 80.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1565      +/-   ##
==========================================
+ Coverage   76.39%   76.58%   +0.18%     
==========================================
  Files         379      381       +2     
  Lines       17942    18130     +188     
  Branches     2479     2516      +37     
==========================================
+ Hits        13707    13885     +178     
+ Misses       2974     2973       -1     
- Partials     1261     1272      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lawofcycles

This comment was marked as resolved.

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
@lawofcycles
Copy link
Contributor Author

I realized my previous comment was based on a misunderstanding.
I found that searchable snapshot testing doesn't require remote store configuration. It only needs the snapshot repository (path.repo), which is already configured in the existing Gradle setup.

I've now added an integration test (DeleteSearchableSnapshotActionIT) that verifies the delete_snapshot functionality with searchable snapshots.

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
@lawofcycles
Copy link
Contributor Author

lawofcycles commented Jan 12, 2026

The CI failed due to a timeout.

Error: The action 'Build with Gradle' has timed out after 20 minutes.

The added integration test passes locally in 1 ~ 2 minutes. I'm not sure if this is related to my changes or a pre-existing issue.
Any guidance would be appreciated.

@bowenlan-amzn
Copy link
Member

@lawofcycles commented on Jan 12, 2026, 12:49 AM PST:

The CI failed due to a timeout.

Error: The action 'Build with Gradle' has timed out after 20 minutes.

The added integration test passes locally in 1 ~ 2 minutes. I'm not sure if this is related to my changes or a pre-existing issue.
Any guidance would be appreciated.

Originally posted by @lawofcycles in #1565 (comment)

Hi, please feel free to update the timeout-minutes in the workflow file to get the checks running to the end https://github.com/opensearch-project/index-management/blob/main/.github/workflows/test-and-build-workflow.yml

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
@lawofcycles lawofcycles force-pushed the feature/delete-searchable-snapshot branch from 60a4e85 to 995aba9 Compare January 13, 2026 07:17
Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
@lawofcycles
Copy link
Contributor Author

Hi, please feel free to update the timeout-minutes in the workflow file to get the checks running to the end https://github.com/opensearch-project/index-management/blob/main/.github/workflows/test-and-build-workflow.yml

@bowenlan-amzn
Thanks! CI passed after updating timeout-minutes to 25 minutes.

@bowenlan-amzn
Copy link
Member

If the snapshot for a ISM-managed remote index (backed by a snapshot) is not just for that one index, I think it's better to use snapshot management to manage the lifecycle of that snapshot, instead of index state management. ISM generally only works on single index level.

@lawofcycles
Copy link
Contributor Author

lawofcycles commented Jan 25, 2026

@bowenlan-amzn Thank you for the feedback. I want to make sure I understand your concern correctly.
Are you suggesting that

  1. Snapshot lifecycle management should be handled by Snapshot Management, not ISM, even for 1:1 index-to-snapshot relationships?
  2. This is about maintaining architectural boundaries where ISM manages index lifecycle and Snapshot Management manages snapshot lifecycle?

My current implementation only deletes a snapshot when no other indices are using it. If multiple indices share the same snapshot, the snapshot deletion is skipped entirely.
However, I understand your point might be about the design principle itself.

I'm open to exploring alternative approaches to Snapshot Management based on your guidance.

@bowenlan-amzn
Copy link
Member

@lawofcycles commented on Jan 24, 2026, 7:54 PM PST:

@bowenlan-amzn Thank you for the feedback. I want to make sure I understand your concern correctly.
Are you suggesting that

  1. Snapshot lifecycle management should be handled by Snapshot Management, not ISM, even for 1:1 index-to-snapshot relationships?
  2. This is about maintaining architectural boundaries where ISM manages index lifecycle and Snapshot Management manages snapshot lifecycle?

My current implementation only deletes a snapshot when no other indices are using it. If multiple indices share the same snapshot, the snapshot deletion is skipped entirely.
However, I understand your point might be about the design principle itself.

I'm open to exploring alternative approaches to Snapshot Management based on your guidance.

Originally posted by @lawofcycles in #1565 (comment)

1:1 index-to-snapshot relationship makes sense and I can see that's what you are trying to achieve here.
But considering snapshot management recently was enhanced to do deletion only policy: opensearch-project/documentation-website#11130, I feel we may not need this feature in ISM. For example, a deletion-only snapshot management policy with certain max_age condition can help manage the snapshot created by ISM per index.
I think this would depend on the actual use case, please let me know what you think.


if (action.deleteSnapshot) {
try {
val settingsResponse = getIndexSettings(indexName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we can directly read the setting from cluster state object, so don't need to make a network call for that.

private fun getOriginalSettings(indexName: String, clusterService: ClusterService): Map<String, String> {
val indexSettings = clusterService.state().metadata.index(indexName).settings
val originalSettings = mutableMapOf<String, String>()
indexSettings.get(ROUTING_SETTING)?.let { originalSettings.put(ROUTING_SETTING, it) }
indexSettings.get(SETTING_BLOCKS_WRITE)?.let { originalSettings.put(SETTING_BLOCKS_WRITE, it) }
return originalSettings
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated to read settings from cluster state directly.
234f71f

@lawofcycles
Copy link
Contributor Author

lawofcycles commented Feb 9, 2026

@bowenlan-amzn Thank you for the suggestion. Let me describe the use case I'm trying to address, so we can discuss whether the existing SM deletion-only policy can cover it.

Use case (from #1476)

A typical Hot -> Warm (searchable snapshot) -> Delete lifecycle managed by a ISM policy

  1. ISM creates a snapshot of the index
  2. ISM converts the index to a searchable snapshot (remote_snapshot) via convert_index_to_remote
  3. After a retention period, ISM deletes the searchable snapshot index

The problem is that step 3 only deletes the index, not the underlying snapshot in the repository.

I looked into using an SM deletion-only policy with snapshot_pattern and max_age as an alternative. However, the current SM deletion logic (DeletingState) does not check whether a snapshot is still being used as the source of a searchable snapshot index. It filters snapshots solely by max_age/max_count/min_count.

This means that if the SM deletion schedule runs before ISM deletes the searchable snapshot index, SM could delete a snapshot that is still actively backing a mounted index, making that index's data inaccessible.

Since ISM controls the index lifecycle, it knows exactly when the index is no longer needed, so triggering the snapshot cleanup at the point of index deletion would avoid this timing issue.

I would appreciate your thoughts on how to best address this use case.

@bowenlan-amzn
Copy link
Member

@lawofcycles commented on Feb 9, 2026, 3:54 PM PST:

Thank you for the suggestion. Let me describe the use case I'm trying to address, so we can discuss whether the existing SM deletion-only policy can cover it.

Use case (from #1476)

A typical Hot -> Warm (searchable snapshot) -> Delete lifecycle managed by a ISM policy

  1. ISM creates a snapshot of the index
  2. ISM converts the index to a searchable snapshot (remote_snapshot) via convert_index_to_remote
  3. After a retention period, ISM deletes the searchable snapshot index

The problem is that step 3 only deletes the index, not the underlying snapshot in the repository.

I looked into using an SM deletion-only policy with snapshot_pattern and max_age as an alternative. However, the current SM deletion logic (DeletingState) does not check whether a snapshot is still being used as the source of a searchable snapshot index. It filters snapshots solely by max_age/max_count/min_count.

This means that if the SM deletion schedule runs before ISM deletes the searchable snapshot index, SM could delete a snapshot that is still actively backing a mounted index, making that index's data inaccessible.

Since ISM controls the index lifecycle, it knows exactly when the index is no longer needed, so triggering the snapshot cleanup at the point of index deletion would avoid this timing issue.

I would appreciate your thoughts on how to best address this use case.

Your use case involves a 1:1 mapping between snapshots and indices. In ISM, we can implement this by storing the snapshot name in the managed index metadata after successful completion, then reading and deleting that specific snapshot during the delete step.

The snapshot management feature differs—it creates snapshots for index groups on a time-based schedule. Deletion retains only the latest snapshots (at minimum, the most recent one) rather than removing all snapshots. However, it's unclear whether this approach works for searchable indices—specifically, whether a searchable index must bind to one specific snapshot or can function with any snapshot containing it. Could you check this once you got time, thanks!

@lawofcycles
Copy link
Contributor Author

Your use case involves a 1:1 mapping between snapshots and indices. In ISM, we can implement this by storing the snapshot name in the managed index metadata after successful completion, then reading and deleting that specific snapshot during the delete step.

The snapshot management feature differs—it creates snapshots for index groups on a time-based schedule. Deletion retains only the latest snapshots (at minimum, the most recent one) rather than removing all snapshots. However, it's unclear whether this approach works for searchable indices—specifically, whether a searchable index must bind to one specific snapshot or can function with any snapshot containing it. Could you check this once you got time, thanks!

@bowenlan-amzn I looked into the binding behavior of searchable snapshot indices. Here are my findings from testing on OpenSearch 3.5.0.

1. Searchable snapshots are hard bound to a specific snapshot
When an index is restored with storage_type: remote_snapshot, the index settings record the exact snapshot it was created from:

index.searchable_snapshot.repository: my-repo
index.searchable_snapshot.snapshot_id.name: snapshot-a
index.searchable_snapshot.snapshot_id.uuid: c43AnLliStevfiMUBtzxkA
index.store.type: remote_snapshot

The binding is at the UUID level, so the index cannot function with a different snapshot, even if that snapshot contains the same data.

2. OpenSearch protects the bound snapshot from deletion

Attempting to delete snapshot-a while the searchable snapshot index exists returns.

snapshot_in_use_deletion_exception:
[my-repo:[snapshot-a]] These remote snapshots are backing some indices
and hence can't be deleted! No snapshots were deleted.

3. SM deletion policy behavior with searchable snapshot bound snapshots
I also tested an SM deletion only policy with snapshot_pattern configured to match the target snapshots. When SM attempts to delete a snapshot that is still backing a searchable snapshot index, it receives the same snapshot_in_use_deletion_exception. SM treats this as a retryable error, retries 3 times, then marks the workflow as FAILED and resets. On the next scheduled trigger, it repeats the same cycle. This continues until the searchable snapshot index is deleted by ISM, after which SM can successfully delete the snapshot on its next run.

…instead of network call

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
@bowenlan-amzn
Copy link
Member

@lawofcycles appreciate your time spent this research!
I'm sorry that I may think too much because I saw the logic of handling a snapshot containing other indexes.

To clear the expected user experience:

A typical Hot -> Warm (searchable snapshot) -> Delete lifecycle managed by a ISM policy
ISM creates a snapshot of the index
ISM converts the index to a searchable snapshot (remote_snapshot) via convert_index_to_remote
After a retention period, ISM deletes the searchable snapshot index

  • ISM set index to be read-only, and creates a snapshot just for this one managed index (index warm means it becomes read only, not writeable anymore)
  • ISM converts the index to a searchable snapshot index from above snapshot via convert_index_to_remote
  • After a retention period, ISM deletes the index and snapshot. (I remember if the index gets deleted, ISM managed index state will also gets cleaned up, so if snapshot deletion failed, we probably don't have another chance to retry)

The constraint here is one snapshot created for one index. So we don't need to worry about this snapshot used by other index so don't have to check any of that.

Snapshot management seems still usable here to back a group of indexes, but it feels hard for ISM convert_index_to_remote action to pick which snapshot to restore during runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add "delete searchable_snapshot" in ISM policies

2 participants