Skip to content

Add REST & Transport layers section to GeneralArchitectureGuide.md #126377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
74 changes: 73 additions & 1 deletion docs/internal/GeneralArchitectureGuide.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,78 @@
# General Architecture

## Transport Actions
## REST & Transport layers

[TransportAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/TransportAction.java
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the links to the end of each section, instead of the beginning? I read plain text 😅 The links don't have to be defined before use, if that was a concern.

[ActionPlugin]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java
[ActionModule]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/ActionModule.java
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ActionModule link is used in a later section, if you wanted to move it closer.


There are two main types of network communication used in Elasticsearch:
- External clients interact with the cluster via the public REST API over HTTP connections
- Cluster nodes communicate internally using a binary message format over TCP connections
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest introducing the term "transport" here rather than calling it binary/TCP below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on mentioning this is what we refer to as the transport layer.


> [!NOTE]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd push this down to the end of the transport layer. It's a little weird to start the conversation with a note about not covering something.

Actually, you could remove the note part entirely and just mention that CCR and replication use the transport layer for communication. I'm not sure what more there is to say, that you'd need to say it's out of scope? And the Distributed arch guide file has Replication and CCR sections, if you want to link to those -- granted the sections haven't been filled out yet, but someday a link like https://github.com/elastic/elasticsearch/blob/main/docs/internal/DistributedArchitectureGuide.md#replication will lead to goodness.

> Cross-cluster [replication](https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html) and [search](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cross-cluster-search.html) use binary/TCP messaging for inter-cluster communication but this is out of scope for this section

The node that a REST request arrives at is called the "coordinating" node. Its job is to coordinate the execution of the request with the other nodes in the cluster and when the requested action is completed, return the possibly aggregated response to the REST client via HTTP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The node that a REST request arrives at is called the "coordinating" node. Its job is to coordinate the execution of the request with the other nodes in the cluster and when the requested action is completed, return the possibly aggregated response to the REST client via HTTP.
The node that a REST request arrives at is called the "coordinating" node. Its job is to coordinate the execution of the request with the other nodes in the cluster and to return the (possibly aggregated) response to the REST client via HTTP when the requested action is completed.


By default, all nodes will act as coordinating nodes, but by specifying `node.roles` to be empty you can create a [coordinating-only node](https://www.elastic.co/guide/en/elasticsearch/reference/current/node-roles-overview.html#coordinating-only-node-role).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a little confusing to talk about node roles given we're considering coordinating REST requests here. All nodes handle REST requests, this is not optional.


Copy link
Contributor

@DiannaHohensee DiannaHohensee Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some high level info to this section about rest and transport actions, how they are connected, before diving into the details in the sub-sections? Like

  • the file naming conventions
  • that all user accessible actions have a rest action that calls into a transport action (maybe even mention that not all transport actions have a corresponding rest action, to highlight our internode-only actions)
  • how transport actions are invoked from the rest action
  • how to find the relevant rest and transport actions starting from a server request like /_cluster/settings -- you mention routes in the REST layer section, but don't explain it.

### REST Layer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general in this section, you're diving into the implementation details, but not providing many high level words to tie everything together. You could start off with a high level explanation that ties multiple classes / concepts together, and then elaborate later in the section.

Some details like "the request handling logic, and some other runtime characteristics such as path/query parameters that are supported and the content type(s) it accepts" are briefly mentioned, but not explained, and so are difficult to digest.

For example, I might start the REST section like

"All user requests are sent to REST endpoints in Elasticsearch. Each endpoint is handled in a corresponding Rest*Action.java file. All of these endpoints are implementations of the RestHandler, usually via BaseRestHandler. Each REST endpoint implementation is registered in the ActionModule and mapped to a REST endpoint. Plugins can also contribute mappings of REST endpoints to implementations, when they are activated, via the ActionPlugin#getRestHandlers interface: ActionModule#initRestHandlers iterates the plugins and registers their actions as well. .

The RestHandler#routes() method implementation specifies what endpoints are handled by a particular REST endpoint implementation. <those other attributes you mentioned -- request handling, query params>. "

With appropriate code links and more elaboration. I didn't polish this, and doesn't include everything, but to try to give an idea of what I mean.


[BaseRestHandler]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/rest/BaseRestHandler.java
[RestHandler]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/rest/RestHandler.java
[Route]:https://github.com/elastic/elasticsearch/blob/0b09506b543231862570c7c1ee623c1af139bd5a/server/src/main/java/org/elasticsearch/rest/RestHandler.java#L134
[getRestHandlers]:https://github.com/elastic/elasticsearch/blob/0b09506b543231862570c7c1ee623c1af139bd5a/server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java#L76
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth going back and updating the links, but I typically tie the link to a minor version of ES for readability -- if you're inclined in future.

E.g. https://github.com/elastic/elasticsearch/blob/v9.0.0-rc1/server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java#L76

[RestBulkAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/rest/action/document/RestBulkAction.java
[RestController]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/rest/RestController.java
[HttpServerTransport.Dispatcher]:https://github.com/elastic/elasticsearch/blob/997a7b8fab6c0bcaacf963c28fe98024492960c5/server/src/main/java/org/elasticsearch/http/HttpServerTransport.java#L36
[HttpServerTransport]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/http/HttpServerTransport.java
[Netty4HttpServerTransport]:https://github.com/elastic/elasticsearch/blob/main/modules/transport-netty4/src/main/java/org/elasticsearch/http/netty4/Netty4HttpServerTransport.java

Each REST endpoint is defined by a [RestHandler] instance. [RestHandler] implementations define the list of [Route]s that they handle, the request handling logic, and some other runtime characteristics such as path/query parameters that are supported and the content type(s) it accepts. There are many built-in REST endpoints configured statically in [ActionModule], additional endpoints can be contributed by [ActionPlugin]s via the [getRestHandlers] method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "[RestHandler] implementations", I'd link an implementation like https://github.com/elastic/elasticsearch/blob/v9.0.0-rc1/server/src/main/java/org/elasticsearch/rest/action/cat/RestSnapshotAction.java#L41-L44. Or leave it without a link since you already referenced the class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"There are many built-in REST endpoints configured statically in [ActionModule]..."

Linking directly to the code in ActionModule that you mean to reference would be helpful. Otherwise it's ambiguous for someone who doesn't already know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"request handling logic"

link to implementation example?


[BaseRestHandler] is the base class for almost all REST endpoints in Elasticsearch. It validates the request parameters against those which are supported, delegates to its sub-classes to set up the execution of the requested action, then delivers the request content to the action either as a single parsed payload or a stream of binary chunks. Actions such as the [RestBulkAction] use the streaming capability to process large payloads incrementally and apply back-pressure when overloaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting lines to roughly 140 characters would help when reading this plainly in an IDE or text file (and when editing in future).


The sub-classes of [BaseRestHandler], usually named `Rest{action-name}Action`, are the entry-points to the cluster, where HTTP requests from outside the cluster are translated into internal [TransportAction] invocations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good starting paragraph (minus BaseRestHandler): maybe move it earlier someplace? Tying rest and transport together, and file naming conventions, could be done before the REST and Transport sections, too. Then you dive into the details in each section, after having explained their relationship.


[RestHandler]s are registered with the [RestController], which implements request routing and some cross-cutting concerns. [RestController] is a [HttpServerTransport.Dispatcher], which dispatches requests for a [HttpServerTransport]. [HttpServerTransport] is our HTTP abstraction, of which there is a single [Netty-based implementation][Netty4HttpServerTransport].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this code. It's intriguing information, and points out the code components where to learn more, but this doesn't currently tell me anything high level with which to orient myself in code spelunking. Could you add some more high level explanation here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention "cross-cutting concerns", but no explanation of what they are -- please elaborate or delete.


> [!NOTE]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to differentiate this with a note, rather it's useful information like the rest of the section.

> `Rest{action-name}Action` classes often have a corresponding `Transport{action-name}Action`, this naming convention makes it easy to locate the corresponding [RestHandler] for a [TransportAction]. (e.g. `RestGetAction` calls `TransportGetAction`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth also spelling out how to determine the TransportAction from a RestAction for those cases where the naming convention doesn't hold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++


### Transport Layer

[ActionRequest]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/ActionRequest.java
[NodeClient]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/client/internal/node/NodeClient.java
[TransportMasterNodeAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/master/TransportMasterNodeAction.java
[TransportLocalClusterStateAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/local/TransportLocalClusterStateAction.java
[TransportReplicationAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java
[TransportNodesAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/nodes/TransportNodesAction.java
[TransportSingleShardAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/single/shard/TransportSingleShardAction.java
[getActions]:https://github.com/elastic/elasticsearch/blob/0b09506b543231862570c7c1ee623c1af139bd5a/server/src/main/java/org/elasticsearch/plugins/ActionPlugin.java#L55
[ActionType]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/ActionType.java
[ActionModule#setupActions]:https://github.com/elastic/elasticsearch/blob/997a7b8fab6c0bcaacf963c28fe98024492960c5/server/src/main/java/org/elasticsearch/action/ActionModule.java#L612
[NodeClient#executeLocally]:https://github.com/elastic/elasticsearch/blob/997a7b8fab6c0bcaacf963c28fe98024492960c5/server/src/main/java/org/elasticsearch/client/internal/node/NodeClient.java#L101
[TransportService#sendRequest]:https://github.com/elastic/elasticsearch/blob/997a7b8fab6c0bcaacf963c28fe98024492960c5/server/src/main/java/org/elasticsearch/transport/TransportService.java#L767
[TransportService#registerRequestHandler]:https://github.com/elastic/elasticsearch/blob/997a7b8fab6c0bcaacf963c28fe98024492960c5/server/src/main/java/org/elasticsearch/transport/TransportService.java#L1197
[HandledTransportAction]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/HandledTransportAction.java
[TransportService]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/transport/TransportService.java

When `Rest{action-name}Action` handlers receive a request, they typically translate the request into a [ActionRequest] and dispatch it via the provided [NodeClient]. The [NodeClient] is the entrypoint into the "transport layer" over which internal cluster actions are coordinated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth highlighting that there's some confusion about the term "transport" in this sense. Most so-called transport actions are invoked only on the local node and do not involve the transport protocol or TCP connections. It's mentioned below, but I think saying explicitly that the terminology is confusing would help new starters.

The two key features of a TransportAction are:

  1. their constructor parameters are provided via dependency injection at runtime rather than by explicit instantiation, and
  2. they represent a security boundary, in the sense that we check the calling user is authorized to call the action they're calling


Elasticsearch contains many built-in [TransportAction]s, configured statically in [ActionModule], additional actions can be contributed by [ActionPlugin]s via the [getActions] method. [TransportAction]s define the request and response types used to invoke the action and the logic for performing the action.

[TransportAction]s that are registered in [ActionModule#setupActions] (including those supplied by plugins) are locally bound to their [ActionType]. This map of `type -> action` bindings is what [NodeClient] instances use to locate actions in [NodeClient#executeLocally].

The [NodeClient] executes all actions locally on the invoking node, the actions themselves sometimes dispatch downstream actions to other nodes in the cluster via the transport layer (see [TransportService#sendRequest]). To be callable in this way, actions must register themselves with the [TransportService] by calling [TransportService#registerRequestHandler]. [HandledTransportAction] is a common parent class which registers an action with the [TransportService].

There are a few common patterns for [TransportAction] execution which are present in the codebase. Some prominent examples include...

- [TransportMasterNodeAction]: Executes an action on the master node. Typically used to perform cluster state updates, as these can only be performed on the master. The base class contains logic for locating the master node and delegating to it to execute the specified logic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should list HandledTransportAction which handles the the action on the local node. It will usually fire another action, such as reading from an index, which may reach out to another node. TransportMasterNodeAction itself is a subclass of HandledTransportAction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it in 45b5c7a when discussing the different places actions can be registered

- [TransportNodesAction]: Executes an action on many nodes then collates the responses.
- [TransportLocalClusterStateAction]: Waits for a cluster state that optionally meets some criteria and performs a read action on it on the coordinating node.
- [TransportReplicationAction]: Execute an action on a primary shard followed by all replicas that exist for that shard. The base class implements logic for locating the primary and replica shards in the cluster and delegating to the relevant nodes. Often used for index updates in stateful Elasticsearch.
- [TransportSingleShardAction]: Executes a read operation on a specific shard, the base class contains logic for locating an available copy of the nominated shard and delegating to the relevant node to execute the action. On a failure, the action is retried on a different copy.

## Serializations

Expand Down
Loading