- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
Increase timeout for searchable snapshots in ILM tests #137514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase timeout for searchable snapshots in ILM tests #137514
Conversation
As of elastic#133954, we clone indices before performing the force-merge step in the `searchable_snapshot` action. On slow CI servers, 10 seconds for the index to go through the whole `searchable_snapshot` action isn't enough, so we bump the timeout to 20 seconds. I looked at the logs of a few test failures and ILM was clearly still progressing when the test timed out. I didn't identify any particular step that was taking extraordinarily long; there were always just a few steps that took a bit longer. I would love to make these tests faster rather than bumping the timeout, but the `searchable_snapshot` action is simply one of the largest ILM actions and ILM itself isn't particularly fast. That being said, if a timeout of 20 seconds proves to be insufficient, I do think it's worth having a look at reducing the runtime of the tests somehow first before we increase the timeout further.
| 
           Pinging @elastic/es-data-management (Team:Data Management)  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks as always.
| assertOK(client().performRequest(restoreSnapshot)); | ||
| 
               | 
          ||
| assertThat(indexExists(searchableSnapMountedIndexName), is(true)); | ||
| awaitIndexExists(searchableSnapMountedIndexName); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking, this one doesn't need the extended timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this is just waiting for the index to be restored after the _restore API from a few lines before. That should definitely not take more than 10 seconds. Thanks for checking!
| Map<String, Phase> phases = new HashMap<>(); | ||
| phases.put("cold", new Phase("cold", TimeValue.ZERO, coldActions)); | ||
| phases.put("delete", new Phase("delete", TimeValue.timeValueMillis(10000), Map.of(DeleteAction.NAME, WITH_SNAPSHOT_DELETE))); | ||
| phases.put("delete", new Phase("delete", TimeValue.ZERO, Map.of(DeleteAction.NAME, WITH_SNAPSHOT_DELETE))); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, FTR, I changed this value from 10s to 0s because there is no point in waiting 10 seconds before we delete the searchable snapshotted index; we can just delete it immediately without compromising the flakiness or value of this test.
          💔 Backport failed
 You can use sqren/backport to manually backport by running   | 
    
          💚 All backports created successfully
 Questions ?Please refer to the Backport tool documentation  | 
    
As of elastic#133954, we clone indices before performing the force-merge step in the `searchable_snapshot` action. On slow CI servers, 10 seconds for the index to go through the whole `searchable_snapshot` action isn't enough, so we bump the timeout to 20 seconds. I looked at the logs of a few test failures, and ILM was clearly still progressing when the test timed out. I didn't identify any particular step that was taking extraordinarily long; there were always just a few steps that took a bit longer. I would love to make these tests faster rather than bumping the timeout, but the `searchable_snapshot` action is simply one of the largest ILM actions and ILM itself isn't particularly fast. That being said, if a timeout of 20 seconds proves to be insufficient (i.e. test failures come back), I do think it's worth having a look at reducing the runtime of the tests somehow first before we increase the timeout further. Closes elastic#137149 Closes elastic#137151 Closes elastic#137152 Closes elastic#137153 Closes elastic#137156 Closes elastic#137166 Closes elastic#137167 Closes elastic#137192 (cherry picked from commit 60b89a8) # Conflicts: # muted-tests.yml
…7524) As of #133954, we clone indices before performing the force-merge step in the `searchable_snapshot` action. On slow CI servers, 10 seconds for the index to go through the whole `searchable_snapshot` action isn't enough, so we bump the timeout to 20 seconds. I looked at the logs of a few test failures, and ILM was clearly still progressing when the test timed out. I didn't identify any particular step that was taking extraordinarily long; there were always just a few steps that took a bit longer. I would love to make these tests faster rather than bumping the timeout, but the `searchable_snapshot` action is simply one of the largest ILM actions and ILM itself isn't particularly fast. That being said, if a timeout of 20 seconds proves to be insufficient (i.e. test failures come back), I do think it's worth having a look at reducing the runtime of the tests somehow first before we increase the timeout further. Closes #137149 Closes #137151 Closes #137152 Closes #137153 Closes #137156 Closes #137166 Closes #137167 Closes #137192 (cherry picked from commit 60b89a8) # Conflicts: # muted-tests.yml
As of #133954, we clone indices before performing the force-merge step in the
searchable_snapshotaction. On slow CI servers, 10 seconds for the index to go through the wholesearchable_snapshotaction isn't enough, so we bump the timeout to 20 seconds.I looked at the logs of a few test failures, and ILM was clearly still progressing when the test timed out. I didn't identify any particular step that was taking extraordinarily long; there were always just a few steps that took a bit longer. I would love to make these tests faster rather than bumping the timeout, but the
searchable_snapshotaction is simply one of the largest ILM actions and ILM itself isn't particularly fast.That being said, if a timeout of 20 seconds proves to be insufficient (i.e. test failures come back), I do think it's worth having a look at reducing the runtime of the tests somehow first before we increase the timeout further.
Closes #137149
Closes #137151
Closes #137152
Closes #137153
Closes #137156
Closes #137166
Closes #137167
Closes #137192