Skip to content

Add REST & Transport layers section to GeneralArchitectureGuide.md #126377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Apr 7, 2025

As part of the effort to document onboarding topics I tried to document the contents of "REST vs Transport layer and coordinating nodes" topic.

There's a lot more that could be said, advice on how much more of that to include is appreciated. I was hoping to keep the section(s) roughly aligned with onboarding topics, but I think I've already said too much about TransportActions

See rendered content at https://github.com/nicktindall/elasticsearch/blob/add_rest_and_transport_layers/docs/internal/GeneralArchitectureGuide.md

Closes: ES-7885

@nicktindall nicktindall added >docs General docs changes :Distributed Coordination/Network Http and internode communication implementations labels Apr 7, 2025
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


There are a few common patterns for [TransportAction] execution which are present in the codebase. Some prominent examples include...

- [TransportMasterNodeAction]: Executes an action on the master node. Typically used to perform cluster state updates, as these can only be performed on the master. The base class contains logic for locating the master node and delegating to it to execute the specified logic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should list HandledTransportAction which handles the the action on the local node. It will usually fire another action, such as reading from an index, which may reach out to another node. TransportMasterNodeAction itself is a subclass of HandledTransportAction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it in 45b5c7a when discussing the different places actions can be registered

Comment on lines 51 to 53
Elasticsearch contains many built-in [TransportAction]s, configured statically in [ActionModule], additional actions can be contributed by [ActionPlugin]s via the [getActions] method. [TransportAction]s define the request and response types used to invoke the action and the logic for performing the action. [TransportAction]s are registered against an [ActionType] which uniquely identifies the action.

The [NodeClient] executes all actions locally on the invoking node, the actions themselves contain logic for dispatching downstream actions to other nodes in the cluster via the transport layer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth to note that an action must be registered via either ActionModule or plugin method before it can be invoked with a client. In contrast, if an action is not meant to be invoked by clients, it does not need such registration and only need to be registered with TransportService, for example, the node specific action of a NodesAction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to explain this in 45b5c7a

@nicktindall nicktindall marked this pull request as ready for review April 8, 2025 03:40
@elasticsearchmachine elasticsearchmachine added Team:Docs Meta label for docs team Team:Distributed Coordination Meta label for Distributed Coordination team labels Apr 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few suggestions but nothing blocking


There are two main types of network communication used in Elasticsearch:
- External clients interact with the cluster via the public REST API over HTTP connections
- Cluster nodes communicate internally using a binary message format over TCP connections
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest introducing the term "transport" here rather than calling it binary/TCP below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on mentioning this is what we refer to as the transport layer.


The node that a REST request arrives at is called the "coordinating" node. Its job is to coordinate the execution of the request with the other nodes in the cluster and when the requested action is completed, return the possibly aggregated response to the REST client via HTTP.

By default, all nodes will act as coordinating nodes, but by specifying `node.roles` to be empty you can create a [coordinating-only node](https://www.elastic.co/guide/en/elasticsearch/reference/current/node-roles-overview.html#coordinating-only-node-role).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a little confusing to talk about node roles given we're considering coordinating REST requests here. All nodes handle REST requests, this is not optional.

[RestHandler]s are registered with the [RestController], which implements request routing and some cross-cutting concerns. [RestController] is a [HttpServerTransport.Dispatcher], which dispatches requests for a [HttpServerTransport]. [HttpServerTransport] is our HTTP abstraction, of which there is a single [Netty-based implementation][Netty4HttpServerTransport].

> [!NOTE]
> `Rest{action-name}Action` classes often have a corresponding `Transport{action-name}Action`, this naming convention makes it easy to locate the corresponding [RestHandler] for a [TransportAction]. (e.g. `RestGetAction` calls `TransportGetAction`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth also spelling out how to determine the TransportAction from a RestAction for those cases where the naming convention doesn't hold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

[HandledTransportAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/HandledTransportAction.java
[TransportService]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/transport/TransportService.java

When `Rest{action-name}Action` handlers receive a request, they typically translate the request into a [ActionRequest] and dispatch it via the provided [NodeClient]. The [NodeClient] is the entrypoint into the "transport layer" over which internal cluster actions are coordinated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth highlighting that there's some confusion about the term "transport" in this sense. Most so-called transport actions are invoked only on the local node and do not involve the transport protocol or TCP connections. It's mentioned below, but I think saying explicitly that the terminology is confusing would help new starters.

The two key features of a TransportAction are:

  1. their constructor parameters are provided via dependency injection at runtime rather than by explicit instantiation, and
  2. they represent a security boundary, in the sense that we check the calling user is authorized to call the action they're calling

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the intro and REST sections, but didn't get to the transport section.

I think you have a lot of great content, but I think it's missing high level explanations to A) make that content easier to understand/digest and B) tie the components together into a big picture.

[BaseRestHandler]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/rest/BaseRestHandler.java
[RestHandler]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/rest/RestHandler.java
[Route]:https://github.com/elastic/elasticsearch/blob/0b09506b543231862570c7c1ee623c1af139bd5a/server/src/main/java/org/elasticsearch/rest/RestHandler.java#L134
[getRestHandlers]:https://github.com/elastic/elasticsearch/blob/0b09506b543231862570c7c1ee623c1af139bd5a/server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java#L76
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth going back and updating the links, but I typically tie the link to a minor version of ES for readability -- if you're inclined in future.

E.g. https://github.com/elastic/elasticsearch/blob/v9.0.0-rc1/server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java#L76

[getRestHandlers]:https://github.com/elastic/elasticsearch/blob/0b09506b543231862570c7c1ee623c1af139bd5a/server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java#L76
[RestBulkAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/rest/action/document/RestBulkAction.java

Each REST endpoint is defined by a [RestHandler] instance. [RestHandler] implementations define the list of [Route]s that they handle, the request handling logic, and some other runtime characteristics such as path/query parameters that are supported and the content type(s) it accepts. There are many built-in REST endpoints configured statically in [ActionModule], additional endpoints can be contributed by [ActionPlugin]s via the [getRestHandlers] method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "[RestHandler] implementations", I'd link an implementation like https://github.com/elastic/elasticsearch/blob/v9.0.0-rc1/server/src/main/java/org/elasticsearch/rest/action/cat/RestSnapshotAction.java#L41-L44. Or leave it without a link since you already referenced the class.


Each REST endpoint is defined by a [RestHandler] instance. [RestHandler] implementations define the list of [Route]s that they handle, the request handling logic, and some other runtime characteristics such as path/query parameters that are supported and the content type(s) it accepts. There are many built-in REST endpoints configured statically in [ActionModule], additional endpoints can be contributed by [ActionPlugin]s via the [getRestHandlers] method.

[BaseRestHandler] is the base class for almost all REST endpoints in Elasticsearch. It validates the request parameters against those which are supported, delegates to its sub-classes to set up the execution of the requested action, then delivers the request content to the action either as a single parsed payload or a stream of binary chunks. Actions such as the [RestBulkAction] use the streaming capability to process large payloads incrementally and apply back-pressure when overloaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting lines to roughly 140 characters would help when reading this plainly in an IDE or text file (and when editing in future).


[TransportAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/TransportAction.java
[ActionPlugin]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java
[ActionModule]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/ActionModule.java
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ActionModule link is used in a later section, if you wanted to move it closer.

[getRestHandlers]:https://github.com/elastic/elasticsearch/blob/0b09506b543231862570c7c1ee623c1af139bd5a/server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java#L76
[RestBulkAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/rest/action/document/RestBulkAction.java

Each REST endpoint is defined by a [RestHandler] instance. [RestHandler] implementations define the list of [Route]s that they handle, the request handling logic, and some other runtime characteristics such as path/query parameters that are supported and the content type(s) it accepts. There are many built-in REST endpoints configured statically in [ActionModule], additional endpoints can be contributed by [ActionPlugin]s via the [getRestHandlers] method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"There are many built-in REST endpoints configured statically in [ActionModule]..."

Linking directly to the code in ActionModule that you mean to reference would be helpful. Otherwise it's ambiguous for someone who doesn't already know.


The sub-classes of [BaseRestHandler], usually named `Rest{action-name}Action`, are the entry-points to the cluster, where HTTP requests from outside the cluster are translated into internal [TransportAction] invocations.

[RestHandler]s are registered with the [RestController], which implements request routing and some cross-cutting concerns. [RestController] is a [HttpServerTransport.Dispatcher], which dispatches requests for a [HttpServerTransport]. [HttpServerTransport] is our HTTP abstraction, of which there is a single [Netty-based implementation][Netty4HttpServerTransport].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this code. It's intriguing information, and points out the code components where to learn more, but this doesn't currently tell me anything high level with which to orient myself in code spelunking. Could you add some more high level explanation here?


The sub-classes of [BaseRestHandler], usually named `Rest{action-name}Action`, are the entry-points to the cluster, where HTTP requests from outside the cluster are translated into internal [TransportAction] invocations.

[RestHandler]s are registered with the [RestController], which implements request routing and some cross-cutting concerns. [RestController] is a [HttpServerTransport.Dispatcher], which dispatches requests for a [HttpServerTransport]. [HttpServerTransport] is our HTTP abstraction, of which there is a single [Netty-based implementation][Netty4HttpServerTransport].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention "cross-cutting concerns", but no explanation of what they are -- please elaborate or delete.

[RestHandler]s are registered with the [RestController], which implements request routing and some cross-cutting concerns. [RestController] is a [HttpServerTransport.Dispatcher], which dispatches requests for a [HttpServerTransport]. [HttpServerTransport] is our HTTP abstraction, of which there is a single [Netty-based implementation][Netty4HttpServerTransport].

> [!NOTE]
> `Rest{action-name}Action` classes often have a corresponding `Transport{action-name}Action`, this naming convention makes it easy to locate the corresponding [RestHandler] for a [TransportAction]. (e.g. `RestGetAction` calls `TransportGetAction`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++


[BaseRestHandler] is the base class for almost all REST endpoints in Elasticsearch. It validates the request parameters against those which are supported, delegates to its sub-classes to set up the execution of the requested action, then delivers the request content to the action either as a single parsed payload or a stream of binary chunks. Actions such as the [RestBulkAction] use the streaming capability to process large payloads incrementally and apply back-pressure when overloaded.

The sub-classes of [BaseRestHandler], usually named `Rest{action-name}Action`, are the entry-points to the cluster, where HTTP requests from outside the cluster are translated into internal [TransportAction] invocations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good starting paragraph (minus BaseRestHandler): maybe move it earlier someplace? Tying rest and transport together, and file naming conventions, could be done before the REST and Transport sections, too. Then you dive into the details in each section, after having explained their relationship.

> [!NOTE]
> Cross-cluster [replication](https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html) and [search](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cross-cluster-search.html) use binary/TCP messaging for inter-cluster communication but this is out of scope for this section

The node that a REST request arrives at is called the "coordinating" node. Its job is to coordinate the execution of the request with the other nodes in the cluster and when the requested action is completed, return the possibly aggregated response to the REST client via HTTP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The node that a REST request arrives at is called the "coordinating" node. Its job is to coordinate the execution of the request with the other nodes in the cluster and when the requested action is completed, return the possibly aggregated response to the REST client via HTTP.
The node that a REST request arrives at is called the "coordinating" node. Its job is to coordinate the execution of the request with the other nodes in the cluster and to return the (possibly aggregated) response to the REST client via HTTP when the requested action is completed.

@DiannaHohensee
Copy link
Contributor

DiannaHohensee commented Apr 10, 2025

Uhoh. I started feeling like I'd already dug around in this code: we have the REST layer explained over here hiding under networking https://github.com/elastic/elasticsearch/blob/main/docs/internal/DistributedArchitectureGuide.md#networking. You have some more REST bits, though, and the transport layer was not explained.

@DiannaHohensee
Copy link
Contributor

Uhoh. I started feeling like I'd already dug around in this code: we have the REST layer explained over here hiding under networking https://github.com/elastic/elasticsearch/blob/main/docs/internal/DistributedArchitectureGuide.md#networking. You have some more REST bits, though, and the transport layer was not explained.

I'm sorry I didn't notice this earlier. They'll need to be reconciled. Looks like I was connecting all those topics under networking because the ActionListeners tie responses back to Netty and get passed through the REST layer on netty threads to the Transport layer to then run on potentially different thread pools... That was a little too clever, in retrospect 😅

@DiannaHohensee
Copy link
Contributor

I opened #126643 to move the REST documentation into the General guide, so your final diff will be cleaner. But feel free to propose something different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >docs General docs changes Team:Distributed Coordination Meta label for Distributed Coordination team Team:Docs Meta label for docs team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants