Skip to content

Fix SearchErrorTraceIT and friends to work with batched query execution #127150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

original-brownbear
Copy link
Member

Making this work with batched execution and fixing a memory leak:

  • Fix memory leak by removing listener on first message. There really only is a single message here per node anyway with batched execution in the mix. Either it's a single shard on the data node and we get a single query message or it's multiple shards and we get a single batched message, so fine to remove listener after the first message since all tests do a single request only anyway.
  • Add a new hook that allows inspection of the actual response. This is needed for batched since batched sends a non-error response even if the data node failed all searches. We had this before in the onResponseSent hook but checking the instance after it's been sent over the wire causes needless overhead in the production code so moving to a "before-style" hook here.

part of #125788

Making this work with batched execution and fixing a memory leak:

* Fix memory leak by removing listener on first message. There really only is a single message
here per node anyway with batched execution in the mix. Either it's a single shard on the data node
and we get a single query message or it's multiple shards and we get a single batched message,
so fine to remove listener after the first message since all tests do a single request only anyway.
* Add a new hook that allows inspection of the actual response. This is needed for batched since batched
sends a non-error response even if the data node failed all searches. We had this before in the
`onResponseSent` hook but checking the instance after it's been sent over the wire causes needless
overhead in the production code so moving to a "before-style" hook here.
@original-brownbear original-brownbear added >test Issues or PRs that are addressing/adding tests :Search Foundations/Search Catch all for Search Foundations labels Apr 22, 2025
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.1.0 labels Apr 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

* @param action the request action
* @param response response instance
*/
default void onBeforeResponseSent(long requestId, String action, TransportResponse response) {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner wdyt? this should be ok right? In prod the overhead is negligible I think.

Copy link
Contributor

@benchaplin benchaplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment below. I have a general question as well—what do you mean:

We had this before in the onResponseSent hook

I was a bit surprised that we don't have testing to inspect transport responses being sent from data nodes.

var r = asInstanceOf(SearchQueryThenFetchAsyncAction.NodeQueryResponse.class, response);
for (Object result : r.getResults()) {
if (result instanceof Exception error) {
checkStacktraceStateAndRemove(error, mockTs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is looping + setting the transportMessageHasStackTrace correct here? What if one result has a stack trace: transportMessageHasStackTrace.set(true), then a result in a future iteration doesn't: transportMessageHasStackTrace.set(false). Then transportMessageHasStackTrace really means "did the last-checked shard have a stack trace." I believe our current test cases trigger errors on all shards so the issue is not noticeable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great, I sort of tried to point this out via the fact that we only see a single request per thread in the op too.
I just didn't want to refactor the test here, it seems we could invert the logic here easily and set a boolean "expectsStacktrace" or so in the transport message listener and then simply assert inline instead of after the fact. If we can only communicate one bit back from the listener there isn't really a mathematical way to cleanly assert on it for multiple things :P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this do something like: capture the existence of a stack trace for ALL exception results, and:

  • if they're all true: transportMessageHasStackTrace.set(true)
  • if they're all false: transportMessageHasStackTrace.set(false)
  • else there's something wrong, so throw some assertion error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right that would work but also make this even harder to follow? If we want to fix this my vote would be to invest 5 more minutes here and just assert inline so that we can pass the expectation for everything to the listener at the beginning of each test? :) Otherwise if we go for the 3 outcome logic, we'll have some inline assertions and some "at the end of the test" assertions mixed, that's just needlessly complex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean now - throw out transportMessageHasStackTrace and pass the "expectsStacktrace" to assert against inside the onBeforeResponseSent override. I agree that's better than the mixed assertions.

@original-brownbear
Copy link
Member Author

We had this before in the onResponseSent hook

#125163 see this PR, we were wasting memory holding on to the message just for the noop (in production) hook :) I dropped the holding on to the message till after it's fully flushed to the wire in that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants