ES|QL: Wrap remote errors with cluster name to provide more context #123156

pawankartik-elastic · 2025-02-21T15:12:46Z

Previously, if a remote encountered an error, it'd fail and provide the stacktrace to the user. However, this info does not mention name of the remote. This PR attemps to provide this context.

Here's why I introduced a new exception:

ElasticsearchException is too generic and returns a status code 500 (since it's causes will not be unwrapped),
Other exceptions like SearchException cover only a subset of all the errors that could be thrown at this point (and it's meant specifically for a search error originating within a shard), and,
Cannot use any existing wrappers as wrappers get unwrapped when sending the error back to the user (which loses the context we've built up specifically for the user).

Action items:

Verify if the exception can be serialised over the wire, and,
Check for any concerns wrt backwards compatibility.

Assuming an exception of type <exception type> is thrown, the response without wrapping looks like:

{
    "error": {
        "root_cause": [
            {
                "type": "<exception type>",
                "reason": "<exception message>"
            }
        ],
        "type": "<exception type>",
        "reason": "<exception message>",
        "suppressed": [
            {
                // Suppressed stack trace
            }
        ]
    },
    "status": <appropriate error code that represents cause>
}

With wrapping:

{
    "error": {
        "root_cause": [
            {
                "type": "remote_exception",
                "reason": "Remote [remote1] encountered an error",
                "suppressed": [
                    {
                       // Suppressed stack trace
                    }
                ]
            }
        ],
        "type": "remote_exception",
        "reason": "Remote [remote1] encountered an error",
        "caused_by": {
            "type": "<exception type>",
            "reason": "<exception message>",
        },
        "suppressed": [
            {
                // Suppressed stack trace
            }
        ]
    },
    "status": <appropriate error code that represents cause>
}

elasticsearchmachine · 2025-03-03T14:16:15Z

Hi @pawankartik-elastic, I've created a changelog YAML for you.

thrown

pawankartik-elastic · 2025-03-07T19:20:03Z

Okay, so as expected, the breakages are primarily around security exceptions, and task cancellations.

elasticsearchmachine · 2025-03-13T10:05:35Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

javanna · 2025-03-13T10:29:59Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

@@ -300,6 +301,7 @@ public void execute(
                        cancelQueryOnFailure,
                        execInfo,
                        computeListener.acquireCompute()
+                            .delegateResponse((l, ex) -> l.onFailure(new RemoteComputeException(cluster.clusterAlias(), ex)))


I am not familiar enough with the ComputeService to determine whether this exception is a local only exception, that will never be serialized through the wire. Is that the case?

Yes, it's on my mind right now and I hope to get it confirmed with Nhat later today. Sounds good?

Specifically startComputeOnRemoteCluster is always called in the coordinating node. I am not sure however whether it means "it will be never serialized", as there seem to be scenarios - e.g. with async response - where the end result is serialized, and in that case this exception might have to be serialized too, I am not sure.

What Stas said is correct. This exception is local with a sync query but can be serialized with an async query:

elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/async/StoredAsyncResponse.java

Line 65 in c1efe47

out.writeException(exception);

.

Maybe add an async query with failures to verify this?

cool, if it can be serialized then it needs to be registered as a serializable one, which makes me wonder if we can reuse an existing one instead to avoid that ceremony :)

Thank you for the catch! Yes, the exception is getting serialised for asynchronous queries and has to be handled accordingly.

wonder if we can reuse an existing one instead to avoid that ceremony

To re-use an existing exception, we primarily need to fulfil 2 requirements:

It should propagate the status of the cause, and,

It should not implement the ES wrapper interface to prevent unwrapping when the error is sent back to the user (which discards the context we've built up, i.e. the remote's name).

I don't see any exceptions that we can reuse.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/RemoteComputeException.java

user and move `unwrapIfWrappedInRemoteComputeException` to `EsqlTestUtils`

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

server/src/main/java/org/elasticsearch/ElasticsearchException.java

server/src/main/java/org/elasticsearch/TransportVersions.java

quux00

LGTM. Thanks for doing this!

elasticsearchmachine · 2025-04-02T17:09:59Z

💔 Backport failed

Status	Branch	Result
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 123156

pawankartik-elastic · 2025-04-02T18:21:33Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

…lastic#123156) Wrap remote errors with cluster name to provide more context Previously, if a remote encountered an error, user would see a top-level error that would provide no context about which remote ran into the error. Now, such errors are wrapped in a separate remote exception whose error message clearly specifies the name of the remote cluster and the error that occurred is the cause of this remote exception. (cherry picked from commit e4fb22c) # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/EsqlTestUtils.java

…123156) (#126165) Wrap remote errors with cluster name to provide more context Previously, if a remote encountered an error, user would see a top-level error that would provide no context about which remote ran into the error. Now, such errors are wrapped in a separate remote exception whose error message clearly specifies the name of the remote cluster and the error that occurred is the cause of this remote exception. (cherry picked from commit e4fb22c) # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/EsqlTestUtils.java

…lastic#123156) Wrap remote errors with cluster name to provide more context Previously, if a remote encountered an error, user would see a top-level error that would provide no context about which remote ran into the error. Now, such errors are wrapped in a separate remote exception whose error message clearly specifies the name of the remote cluster and the error that occurred is the cause of this remote exception.

Wrap remote errors with cluster name to provide more context

b2fc21d

elasticsearchmachine added the v9.1.0 label Feb 21, 2025

pawankartik-elastic added :Search Foundations/Search Catch all for Search Foundations >enhancement auto-backport Automatically create backport pull requests when merged Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.0.1 labels Mar 3, 2025

Update docs/changelog/123156.yaml

a049cc9

pawankartik-elastic added 4 commits March 3, 2025 14:38

Merge branch 'main' into pkar/esql-wrap-remote-errors

e32956a

Add test and handle missed subset of scenarios where an error can be

cd3fefa

thrown

Remove unused import

6daa9e1

Merge branch 'main' into pkar/esql-wrap-remote-errors

bc166cc

pawankartik-elastic added 7 commits March 12, 2025 18:07

Introduce RemoteComputeException that forwards the cause's status code

db90b71

Adjust test to match the new exception type

21cfe8a

Fix license header

a7bf704

Merge branch 'main' into pkar/esql-wrap-remote-errors

e1e9592

Adjust test to match the new exception type

8e9976d

Rename test

6dd8421

Merge branch 'main' into pkar/esql-wrap-remote-errors

8f2a39d

pawankartik-elastic marked this pull request as ready for review March 13, 2025 10:05

javanna reviewed Mar 13, 2025

View reviewed changes

quux00 reviewed Mar 14, 2025

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/RemoteComputeException.java Outdated Show resolved Hide resolved

pawankartik-elastic added 5 commits March 17, 2025 10:36

Reword sentence as per review suggestion

57a133a

Merge branch 'main' into pkar/esql-wrap-remote-errors

eb27643

Register exception as serializable

ea67731

Import sort

a155983

Merge branch 'main' into pkar/esql-wrap-remote-errors

24b103d

pawankartik-elastic added 5 commits March 24, 2025 11:15

Merge branch 'main' into pkar/esql-wrap-remote-errors

a5d14a4

Relax assertions for the wrapping

99f9b45

Remove redundant getCause()

2ac89c7

Merge branch 'main' into pkar/esql-wrap-remote-errors

32e26f6

Merge branch 'main' into pkar/esql-wrap-remote-errors

764933f

pawankartik-elastic requested review from dnhatn, javanna and quux00 March 31, 2025 09:12

Ensure transport errors are unwrapped before sending response to the

8ef049f

user and move `unwrapIfWrappedInRemoteComputeException` to `EsqlTestUtils`

quux00 reviewed Mar 31, 2025

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java Show resolved Hide resolved

pawankartik-elastic and others added 4 commits April 1, 2025 09:50

Address review comment: rename exception

04149a4

[CI] Auto commit changes from spotless

4e0baa3

Merge branch 'main' into pkar/esql-wrap-remote-errors

8a5d6a0

Rename leftover remnants

90b314a

quux00 reviewed Apr 1, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/ElasticsearchException.java Outdated Show resolved Hide resolved

quux00 reviewed Apr 1, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/TransportVersions.java Outdated Show resolved Hide resolved

Address review suggestions

0996521

pawankartik-elastic changed the title ~~Wrap remote errors with cluster name to provide more context~~ ES|QL: Wrap remote errors with cluster name to provide more context Apr 1, 2025

quux00 approved these changes Apr 2, 2025

View reviewed changes

pawankartik-elastic added the v8.19.0 label Apr 2, 2025

pawankartik-elastic merged commit e4fb22c into elastic:main Apr 2, 2025
17 checks passed

elasticsearchmachine added the backport pending label Apr 2, 2025

pawankartik-elastic removed the v9.0.1 label Apr 2, 2025

pawankartik-elastic mentioned this pull request Apr 2, 2025

[8.x] ES|QL: Wrap remote errors with cluster name to provide more context (#123156) #126165

Merged

pawankartik-elastic deleted the pkar/esql-wrap-remote-errors branch June 26, 2025 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ES|QL: Wrap remote errors with cluster name to provide more context #123156

ES|QL: Wrap remote errors with cluster name to provide more context #123156

Uh oh!

pawankartik-elastic commented Feb 21, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Mar 3, 2025

Uh oh!

pawankartik-elastic commented Mar 7, 2025

Uh oh!

elasticsearchmachine commented Mar 13, 2025

Uh oh!

javanna Mar 13, 2025

Uh oh!

pawankartik-elastic Mar 13, 2025

Uh oh!

smalyshev Mar 13, 2025

Uh oh!

dnhatn Mar 13, 2025 •

edited

Loading

Uh oh!

javanna Mar 13, 2025

Uh oh!

pawankartik-elastic Mar 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

quux00 left a comment

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 2, 2025 •

edited by pawankartik-elastic

Loading

Uh oh!

pawankartik-elastic commented Apr 2, 2025

Uh oh!

Uh oh!

ES|QL: Wrap remote errors with cluster name to provide more context #123156

ES|QL: Wrap remote errors with cluster name to provide more context #123156

Uh oh!

Conversation

pawankartik-elastic commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 3, 2025

Uh oh!

pawankartik-elastic commented Mar 7, 2025

Uh oh!

elasticsearchmachine commented Mar 13, 2025

Uh oh!

javanna Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

pawankartik-elastic Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

smalyshev Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javanna Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

pawankartik-elastic Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

quux00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 2, 2025 • edited by pawankartik-elastic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💔 Backport failed

Uh oh!

pawankartik-elastic commented Apr 2, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Uh oh!

pawankartik-elastic commented Feb 21, 2025 •

edited

Loading

dnhatn Mar 13, 2025 •

edited

Loading

elasticsearchmachine commented Apr 2, 2025 •

edited by pawankartik-elastic

Loading