-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap remote errors with cluster name to provide more context #123156
base: main
Are you sure you want to change the base?
Wrap remote errors with cluster name to provide more context #123156
Conversation
Hi @pawankartik-elastic, I've created a changelog YAML for you. |
Okay, so as expected, the breakages are primarily around security exceptions, and task cancellations. |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
@@ -300,6 +301,7 @@ public void execute( | |||
cancelQueryOnFailure, | |||
execInfo, | |||
computeListener.acquireCompute() | |||
.delegateResponse((l, ex) -> l.onFailure(new RemoteComputeException(cluster.clusterAlias(), ex))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar enough with the ComputeService to determine whether this exception is a local only exception, that will never be serialized through the wire. Is that the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's on my mind right now and I hope to get it confirmed with Nhat later today. Sounds good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically startComputeOnRemoteCluster
is always called in the coordinating node. I am not sure however whether it means "it will be never serialized", as there seem to be scenarios - e.g. with async response - where the end result is serialized, and in that case this exception might have to be serialized too, I am not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Stas said is correct. This exception is local with a sync query but can be serialized with an async query:
Line 65 in c1efe47
out.writeException(exception); |
Maybe add an async query with failures to verify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, if it can be serialized then it needs to be registered as a serializable one, which makes me wonder if we can reuse an existing one instead to avoid that ceremony :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the catch! Yes, the exception is getting serialised for asynchronous queries and has to be handled accordingly.
wonder if we can reuse an existing one instead to avoid that ceremony
To re-use an existing exception, we primarily need to fulfil 2 requirements:
- It should propagate the status of the cause, and,
- It should not implement the ES wrapper interface to prevent unwrapping when the error is sent back to the user (which discards the context we've built up, i.e. the remote's name).
I don't see any exceptions that we can reuse.
Previously, if a remote encountered an error, it'd fail and provide the stacktrace to the user. However, this info does not mention name of the remote. This PR attemps to provide this context.
Here's why I introduced a new exception:
ElasticsearchException
is too generic and returns a status code500
(since it's causes will not be unwrapped),SearchException
cover only a subset of all the errors that could be thrown at this point (and it's meant specifically for a search error originating within a shard), and,Action items: