Increase timeout for searchable snapshots in ILM tests #137514

nielsbauman · 2025-11-03T12:21:51Z

As of #133954, we clone indices before performing the force-merge step in the searchable_snapshot action. On slow CI servers, 10 seconds for the index to go through the whole searchable_snapshot action isn't enough, so we bump the timeout to 20 seconds.

I looked at the logs of a few test failures, and ILM was clearly still progressing when the test timed out. I didn't identify any particular step that was taking extraordinarily long; there were always just a few steps that took a bit longer. I would love to make these tests faster rather than bumping the timeout, but the searchable_snapshot action is simply one of the largest ILM actions and ILM itself isn't particularly fast.

That being said, if a timeout of 20 seconds proves to be insufficient (i.e. test failures come back), I do think it's worth having a look at reducing the runtime of the tests somehow first before we increase the timeout further.

Closes #137149
Closes #137151
Closes #137152
Closes #137153
Closes #137156
Closes #137166
Closes #137167
Closes #137192

As of elastic#133954, we clone indices before performing the force-merge step in the `searchable_snapshot` action. On slow CI servers, 10 seconds for the index to go through the whole `searchable_snapshot` action isn't enough, so we bump the timeout to 20 seconds. I looked at the logs of a few test failures and ILM was clearly still progressing when the test timed out. I didn't identify any particular step that was taking extraordinarily long; there were always just a few steps that took a bit longer. I would love to make these tests faster rather than bumping the timeout, but the `searchable_snapshot` action is simply one of the largest ILM actions and ILM itself isn't particularly fast. That being said, if a timeout of 20 seconds proves to be insufficient, I do think it's worth having a look at reducing the runtime of the tests somehow first before we increase the timeout further.

elasticsearchmachine · 2025-11-03T12:22:16Z

Pinging @elastic/es-data-management (Team:Data Management)

PeteGillinElastic

Thanks as always.

PeteGillinElastic · 2025-11-03T12:32:00Z

...de/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/SearchableSnapshotActionIT.java

        assertOK(client().performRequest(restoreSnapshot));

-        assertThat(indexExists(searchableSnapMountedIndexName), is(true));
+        awaitIndexExists(searchableSnapMountedIndexName);


Just checking, this one doesn't need the extended timeout?

Nope, this is just waiting for the index to be restored after the _restore API from a few lines before. That should definitely not take more than 10 seconds. Thanks for checking!

nielsbauman · 2025-11-03T12:37:27Z

...de/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/SearchableSnapshotActionIT.java

        Map<String, Phase> phases = new HashMap<>();
        phases.put("cold", new Phase("cold", TimeValue.ZERO, coldActions));
-        phases.put("delete", new Phase("delete", TimeValue.timeValueMillis(10000), Map.of(DeleteAction.NAME, WITH_SNAPSHOT_DELETE)));
+        phases.put("delete", new Phase("delete", TimeValue.ZERO, Map.of(DeleteAction.NAME, WITH_SNAPSHOT_DELETE)));


By the way, FTR, I changed this value from 10s to 0s because there is no point in waiting 10 seconds before we delete the searchable snapshotted index; we can just delete it immediately without compromising the flakiness or value of this test.

elasticsearchmachine · 2025-11-03T14:36:26Z

💔 Backport failed

Status	Branch	Result
❌	9.2	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 137514

nielsbauman · 2025-11-03T14:38:32Z

💚 All backports created successfully

Status	Branch	Result
✅	9.2

Questions ?

Please refer to the Backport tool documentation

As of elastic#133954, we clone indices before performing the force-merge step in the `searchable_snapshot` action. On slow CI servers, 10 seconds for the index to go through the whole `searchable_snapshot` action isn't enough, so we bump the timeout to 20 seconds. I looked at the logs of a few test failures, and ILM was clearly still progressing when the test timed out. I didn't identify any particular step that was taking extraordinarily long; there were always just a few steps that took a bit longer. I would love to make these tests faster rather than bumping the timeout, but the `searchable_snapshot` action is simply one of the largest ILM actions and ILM itself isn't particularly fast. That being said, if a timeout of 20 seconds proves to be insufficient (i.e. test failures come back), I do think it's worth having a look at reducing the runtime of the tests somehow first before we increase the timeout further. Closes elastic#137149 Closes elastic#137151 Closes elastic#137152 Closes elastic#137153 Closes elastic#137156 Closes elastic#137166 Closes elastic#137167 Closes elastic#137192 (cherry picked from commit 60b89a8) # Conflicts: # muted-tests.yml

…7524) As of #133954, we clone indices before performing the force-merge step in the `searchable_snapshot` action. On slow CI servers, 10 seconds for the index to go through the whole `searchable_snapshot` action isn't enough, so we bump the timeout to 20 seconds. I looked at the logs of a few test failures, and ILM was clearly still progressing when the test timed out. I didn't identify any particular step that was taking extraordinarily long; there were always just a few steps that took a bit longer. I would love to make these tests faster rather than bumping the timeout, but the `searchable_snapshot` action is simply one of the largest ILM actions and ILM itself isn't particularly fast. That being said, if a timeout of 20 seconds proves to be insufficient (i.e. test failures come back), I do think it's worth having a look at reducing the runtime of the tests somehow first before we increase the timeout further. Closes #137149 Closes #137151 Closes #137152 Closes #137153 Closes #137156 Closes #137166 Closes #137167 Closes #137192 (cherry picked from commit 60b89a8) # Conflicts: # muted-tests.yml

nielsbauman added >test Issues or PRs that are addressing/adding tests :Data Management/ILM+SLM Index and Snapshot lifecycle management auto-backport Automatically create backport pull requests when merged branch:9.2 labels Nov 3, 2025

elasticsearchmachine added v9.3.0 Team:Data Management Meta label for data/management team v9.2.1 labels Nov 3, 2025

elasticsearchmachine removed the branch:9.2 label Nov 3, 2025

PeteGillinElastic approved these changes Nov 3, 2025

View reviewed changes

nielsbauman commented Nov 3, 2025

View reviewed changes

szybia approved these changes Nov 3, 2025

View reviewed changes

nielsbauman enabled auto-merge (squash) November 3, 2025 12:44

nielsbauman disabled auto-merge November 3, 2025 12:44

nielsbauman enabled auto-merge (squash) November 3, 2025 12:44

nielsbauman added 2 commits November 3, 2025 13:45

Merge branch 'main' into fix-searchable-snapshot-tests

db77ab4

Unmute test

38a2ec4

nielsbauman merged commit 60b89a8 into elastic:main Nov 3, 2025
34 of 35 checks passed

nielsbauman deleted the fix-searchable-snapshot-tests branch November 3, 2025 14:35

elasticsearchmachine added the backport pending label Nov 3, 2025

nielsbauman mentioned this pull request Nov 3, 2025

[9.2] Increase timeout for searchable snapshots in ILM tests (#137514) #137524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase timeout for searchable snapshots in ILM tests #137514

Increase timeout for searchable snapshots in ILM tests #137514

nielsbauman commented Nov 3, 2025

Uh oh!

elasticsearchmachine commented Nov 3, 2025

Uh oh!

PeteGillinElastic left a comment

Uh oh!

PeteGillinElastic Nov 3, 2025

Uh oh!

nielsbauman Nov 3, 2025

Uh oh!

nielsbauman Nov 3, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Nov 3, 2025

Uh oh!

nielsbauman commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Increase timeout for searchable snapshots in ILM tests #137514

Increase timeout for searchable snapshots in ILM tests #137514

Conversation

nielsbauman commented Nov 3, 2025

Uh oh!

elasticsearchmachine commented Nov 3, 2025

Uh oh!

PeteGillinElastic left a comment

Choose a reason for hiding this comment

Uh oh!

PeteGillinElastic Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

nielsbauman Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

nielsbauman Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Nov 3, 2025

💔 Backport failed

Uh oh!

nielsbauman commented Nov 3, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants