-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ES-10063 Add multi-project support for more stats APIs #127650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ES-10063 Add multi-project support for more stats APIs #127650
Conversation
This affects the following APIs: - `GET _nodes/stats`: - For `indices`, it now prefixes the index name with the project ID (for non-default projects). Previously, it didn't tell you which project an index was in, and it failed if two projects had the same index name. - For `ingest`, it now gets the pipeline and processor stats for all projects, and prefixes the pipeline ID with the project ID. Previously, it only got them for the default project. - `GET /_cluster/stats`: - For `ingest`, it now aggregates the pipeline and processor stats for all projects. Previously, it only got them for the default project. - `GET /_info`: - For `ingest`, same as for `GET /_nodes/stats`. This is done by making `IndicesService.stats()` and `IngestService.stats()` include project IDs in the `NodeIndicesStats` and `IngestStats` objects they return, and making those stats objects incorporate the project IDs when converting to XContent. The transitive callers of these two methods are rather extensive (including all callers to `NodeService.stats()`, all callers of `TransportNodesStatsAction`, and so on). To ensure the change is safe, the callers were all checked out, and they fall into the following cases: - The behaviour change is one of the desired enhancements described above. - There is no behaviour change because it was getting node stats but neither `indices` nor `ingest` stats were requested. - There is no behaviour change because it was getting `indices` and/or `ingest` stats but only using aggregate values. - In `MachineLearningUsageTransportAction` and `TransportGetTrainedModelsStatsAction`, the `IngestStats` returned will return stats from all projects instead of just the default with this change, but they have been changed to filter the non-default project stats out, so this change is a noop there. (These actions are not MP-ready yet.) - `MonitoringService` will be affected, but this is the legacy monitoring module which is not in use anywhere that MP is going to be enabled. (If anything, the behaviour is probably improved by this change, as it will now include project IDs, rather than producing ambiguous unqualified results and failing in the case of duplicates.)
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this, Pete! I did a first round of review by just looking at the production files. I'll have a look at the tests later on. This feels like the right direction; my comments are mostly small and/or thinking out loud.
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/ClusterStatsNodes.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/ClusterStatsNodes.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/NodeIndicesStats.java
Outdated
Show resolved
Hide resolved
if (Objects.equals(projectId, Metadata.DEFAULT_PROJECT_ID)) { | ||
return index.getName(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit on the fence about this approach. Thinking out loud: on the one hand, this maintains BwC with little effort. On the other hand, if we ever have indices in the default project this'll be a bit off. It's been mentioned a handful of times that we might want to consider (so, super hypothetically) having a shared cluster-level GeoIP index to avoid downloading the same DB over again. We shouldn't make this decision purely based on the GeoIP index, but the idea of the faint possibility of cluster-level indices is something to think about. At the same time, this approach doesn't "break" anything in that case either, it just makes the output inconsistent. So, we either try to account for that situation proactively now, or we decide to go with this approach and we solve that issue if/when it comes. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the options are:
- Include the
default/
prefix on every index in single-project mode. - Live with the fact that people will need to know that the absence of a prefix indicates the default project in multi-project mode.
- Somehow pass in information about whether we're in single- or multi-project mode. I guess that we could go back to making the map nullable and use the null value to indicate that we're in single-project mode and don't need the prefixes.
What I've currently implement is option 2. I think this is certainly better than option 1. But if we're seriously considering having indices in the default project when in MP mode (which I hadn't realized was a possibility) and if we think that people might not realize that no prefix means default (which I'm not sure about) then we could switch to option 3 easily enough. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection, I think I'm leaning towards option 3.
This would also mean that, when deserializing an instance from an older node, we could make the map be null and so not show project prefixes: I think this would be valid, because I assume that we will never have to support BWC with nodes older than the current main
in MP clusters (there's no way that could possibly work!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, ideally, I'd want to go with option 3 too (not showing the project ID in single-project mode and showing the project ID for all indices in multi-project mode), for consistency. But let's park this discussion for a bit until we resolve #127650 (comment) (or choose not to resolve it and live with it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a commit switching to option 3. I know you said to park it, but it's an easy change, and I think it might help the other discussion (esp. for folks who haven't seen the PR before we raise it) if we're looking at the closest we have to a proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ugh, it now occurs to me: IngestStats
still behaves the old way (prefix project ID if non-default). They should be consistent. Switching IngestStats
to behave the new way looks like a little more work because there's no obvious way to indicate to it whether it's in MP mode or not, we'd have to explicitly pass that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether XContent params might be an option (for both). I've not used those so I don't know what limitations there might be.
.../plugin/core/src/test/java/org/elasticsearch/xpack/core/datatiers/DataTierUsageFixtures.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some YAML tests from the ingest-common
module are muted in this file because they depend on stats APIs:
'^test/ingest/15_info_ingest/*', // uses cluster info API
'^test/ingest/70_bulk/Test bulk request with default pipeline', // uses stats API
'^test/ingest/70_bulk/Test bulk request without default pipeline', // uses stats API
However, because we decided to prefix the project ID, we won't be able to unmute those tests... I don't immediately have a good suggestion, but I thought I'd mention it so we're aware. If we decide that that is OK, we should move these lines to the bottom so they're next to the comment explaining why some stats API tests are muted in MP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, thanks. I've made myself a note to come back to this.
server/src/main/java/org/elasticsearch/indices/IndicesService.java
Outdated
Show resolved
Hide resolved
return indices.values() | ||
.stream() | ||
.map(AbstractIndexComponent::index) | ||
.collect(Collectors.toMap(index -> index, index -> clusterService.state().metadata().projectFor(index).id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud: by using Metadata#projectFor
, we throw an exception if the index is not found. I'm not familiar enough with the IndicesService
to know whether there is a possibility of the service being "behind" the state in ClusterService
. For instance, when an index is deleted from the cluster state, is the indices service updated atomically? Or if an index is added, is it first added to the cluster state or the IndicesService
? If you don't already have an answer to those questions (which is completely fine), I think it's worth having a look to be sure what kind of behavior we could see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. That's a good question.
Looking at the creation flow, I believe that creating the IndexService
and adding it to the indices
map in IndicesService
happens as part of an applyClusterState
method in a ClusterStateApplier
, and looking at ClusterApplierService
it seems that this will be invoked before the state
is updated. So it seems plausible that we could call IndicesService.stats
on another thread and observe it in a state where this fails for an index which is in the process of being created on this node.
At any rate, I think that good defensive coding principles dictate that we should make things work as well as possible in these circumstances.
I think the best option might be to switch to using lookupProject(Index)
. That means that we'll just omit the index from the map in that case. Then we have to figure out how to handle that. Right now, it would use null
for the project ID, which is probably confusing. We could make it use something like <unknown>
, which is sort of horrible but at least it's explicit. (We already have the possibility that the map is empty because it came from an older node, although at least that situation is only temporary.)
I don't like that, but I'm struggling to think of any better alternatives right now.
It would be simple if the IndexService
knew its project ID. I think you said it was a deliberate design decision not to do that... Or did was the deliberate decision just that Index
and/or IndexMetadata
wouldn't know their project ID? I've had a glance, and it would be a moderately messy change: there are dozens of code paths we'd have to wire the project ID through to get back to somewhere it's available (not counting tests).
The refactoring to ensure that things happen in a different order which guarantees it would be okay is waaaaay to large and risky to justify.
So I'm going with the <unknown>
option for now. But I'd be very interested to hear your thoughts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We already have the possibility that the map is empty because it came from an older node, although at least that situation is only temporary.)
N.B. If we go with the thing I called option 3 in the other thread, this comment will not be true. The only way that I think this the index could be missing from the map is if we hit this rare timing issue. So I think the question becomes: Do we want to do the plumbing to make an IndexService
know its project ID in order to avoid this? Or is there a better solution that I'm not seeing?
(There is, incidentally, a risk that the stats call could fail if there are two indices with the same name in different projects both of which hit this rare timing issue at the same time, causing a key collision. I'm inclined not to worry about that, as it would be transient, although it could be avoided if we just add a counter to the prefix.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall! I haven't gone over the tests yet, will run another pass for the tests tomorrow.
public NodeIndicesStats( | ||
CommonStats oldStats, | ||
Map<Index, CommonStats> statsByIndex, | ||
Map<Index, List<IndexShardStats>> statsByShard, | ||
Map<Index, ProjectId> projectsByIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like this approach of passing in the index to project map
Change suggested by Niels. Co-authored-by: Niels Bauman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I just pushed a commit which responds to most of the comments here. There's some discussion below.
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/ClusterStatsNodes.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/ClusterStatsNodes.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/IndicesService.java
Outdated
Show resolved
Hide resolved
return indices.values() | ||
.stream() | ||
.map(AbstractIndexComponent::index) | ||
.collect(Collectors.toMap(index -> index, index -> clusterService.state().metadata().projectFor(index).id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. That's a good question.
Looking at the creation flow, I believe that creating the IndexService
and adding it to the indices
map in IndicesService
happens as part of an applyClusterState
method in a ClusterStateApplier
, and looking at ClusterApplierService
it seems that this will be invoked before the state
is updated. So it seems plausible that we could call IndicesService.stats
on another thread and observe it in a state where this fails for an index which is in the process of being created on this node.
At any rate, I think that good defensive coding principles dictate that we should make things work as well as possible in these circumstances.
I think the best option might be to switch to using lookupProject(Index)
. That means that we'll just omit the index from the map in that case. Then we have to figure out how to handle that. Right now, it would use null
for the project ID, which is probably confusing. We could make it use something like <unknown>
, which is sort of horrible but at least it's explicit. (We already have the possibility that the map is empty because it came from an older node, although at least that situation is only temporary.)
I don't like that, but I'm struggling to think of any better alternatives right now.
It would be simple if the IndexService
knew its project ID. I think you said it was a deliberate design decision not to do that... Or did was the deliberate decision just that Index
and/or IndexMetadata
wouldn't know their project ID? I've had a glance, and it would be a moderately messy change: there are dozens of code paths we'd have to wire the project ID through to get back to somewhere it's available (not counting tests).
The refactoring to ensure that things happen in a different order which guarantees it would be okay is waaaaay to large and risky to justify.
So I'm going with the <unknown>
option for now. But I'd be very interested to hear your thoughts!
server/src/main/java/org/elasticsearch/indices/NodeIndicesStats.java
Outdated
Show resolved
Hide resolved
if (Objects.equals(projectId, Metadata.DEFAULT_PROJECT_ID)) { | ||
return index.getName(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the options are:
- Include the
default/
prefix on every index in single-project mode. - Live with the fact that people will need to know that the absence of a prefix indicates the default project in multi-project mode.
- Somehow pass in information about whether we're in single- or multi-project mode. I guess that we could go back to making the map nullable and use the null value to indicate that we're in single-project mode and don't need the prefixes.
What I've currently implement is option 2. I think this is certainly better than option 1. But if we're seriously considering having indices in the default project when in MP mode (which I hadn't realized was a possibility) and if we think that people might not realize that no prefix means default (which I'm not sure about) then we could switch to option 3 easily enough. What do you think?
.../plugin/core/src/test/java/org/elasticsearch/xpack/core/datatiers/DataTierUsageFixtures.java
Outdated
Show resolved
Hide resolved
@@ -564,6 +571,15 @@ IndexShardStats indexShardStats(final IndicesService indicesService, final Index | |||
); | |||
} | |||
|
|||
private Map<Index, ProjectId> projectsByIndex() { | |||
Map<Index, ProjectId> map = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
Map<Index, ProjectId> map = new HashMap<>(); | |
Map<Index, ProjectId> map = new HashMap<>(indices.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason github won't let me commit your suggestion — it says "Sorry, the changes couldn't be applied. Refresh and try again." repeatedly — so I'm making this change myself.
@@ -66,6 +70,7 @@ public class NodeIndicesStats implements Writeable, ChunkedToXContent { | |||
private final CommonStats stats; | |||
private final Map<Index, List<IndexShardStats>> statsByShard; | |||
private final Map<Index, CommonStats> statsByIndex; | |||
private final Map<Index, ProjectId> projectsByIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame we have to pass in this map ourselves (and (de)serialize it), as it's pretty much just the Metadata.ProjectLookup
... It looks like we only need it for the XContent serialization, but I can't think of a way to pass the ProjectLookup
to that serialization method. I do think it's worth thinking about that some more (and maybe asking others for suggestions). This is going to result in a lot more serialization over the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a lot more. It's adding one ProjectId
where there's already a CommonStats
and a List<IndexShardStats>
. But I agree it's unfortunate to be sharing copies of this around everywhere. I'll start a thread on the channel, as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, the serialization works a bit different in this class, it doesn't just serialize a full map. Hm ok that makes it a lot less worse, indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just realized I'm mixing things up. I was thinking of the serialization of IngestStats
. In NodeIndicesStats
, this map will result in quite a bit more serialization: a whole map where we serialize every index (not sure if it's every index on the node or really all indices, but still) and every project ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, it's every index on the node. I still think that ProjectId
is small compared with other stuff we're already storing for each index on the node...
.../plugin/core/src/test/java/org/elasticsearch/xpack/core/datatiers/DataTierUsageFixtures.java
Outdated
Show resolved
Hide resolved
…datatiers/DataTierUsageFixtures.java Co-authored-by: Niels Bauman <[email protected]>
…s include project prefix in XContent in multip-project, even if default; also incorporate one other review comment
I just pushed a commit to switch it to using |
…ialization case. This works out fine, for the reasons given in the comment. As it happens, I'd already forgotten to do the null check in the one place it's actively used.
Those last few commits make some MP REST YAML tests work in the cases where the response contains a project ID prefix. The way this works isn't the most elegant thing in the world, but I think it's okay. There are two parts to it: We populate an entry in the 'stash' for the Here's how the subject of YAML assertions were interpreted before:
This final case is surprising: it means that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, I have posted some minor comments, it should be good to merge very soon :)
.nodesStats(nodesStatsRequest, new RestRefCountedChunkedToXContentListener<>(channel, xContentParamsForRequest(request))); | ||
} | ||
|
||
private ToXContent.DelegatingMapParams xContentParamsForRequest(RestRequest request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, didn't know about ToXContent.Params
. Curious why use ToXContent.DelegatingMapParams
here but ToXContent.MapParams
in RestClusterInfoAction
? Do we expect the request
object to contain the MULTI_PROJECT_ENABLED_XCONTENT_PARAM_KEY
key here?
* Otherwise, it should return {@code true} to indicate the cluster can accommodate multiple projects regardless | ||
* how many project it current has. | ||
*/ | ||
default boolean supportsMultipleProjects() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason for this move so ActionModule
can access this method? I think ActionModule
is constructed with a ProjectResolver
instance in NodeConstruciton
, so changing the ActionModule.ProjectIdResolver
property to ActionModule.ProjectResolver
should make the method accessible. Just feel a little bit more fitting forsupportsMultipleProjects
to stay in ProjectResolver
.
- match: { "ingest.pipelines.${_project_id_prefix_}ingest_info_pipeline.current": 0 } | ||
- match: { "ingest.pipelines.${_project_id_prefix_}ingest_info_pipeline.failed": 0 } | ||
- gt: { "ingest.pipelines.${_project_id_prefix_}ingest_info_pipeline.ingested_as_first_pipeline_in_bytes": 0 } | ||
- gt: { "ingest.pipelines.${_project_id_prefix_}ingest_info_pipeline.produced_as_first_pipeline_in_bytes": 0 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very familiar, could you help me understand how does the ${...}
part get parsed and resolved to the value in the stash?
'^indices.get_alias/10_basic/Get alias against closed indices', // Does NOT work with security enabled, see also core-rest-tests-with-security | ||
'^indices.recovery/*/*', | ||
'^indices.resolve_cluster/*/*', | ||
'^indices.resolve_cluster/*/*/*', | ||
'^indices.shard_stores/*/*', | ||
'^test/ingest/15_info_ingest/*', // uses cluster info API | ||
'^test/ingest/70_bulk/Test bulk request with default pipeline', // uses stats API | ||
'^test/ingest/70_bulk/Test bulk request without default pipeline', // uses stats API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for making these tests work in MP!
), | ||
// Adding a non inference processor that should be ignored | ||
new IngestStats.ProcessorStat("grok", "grok", new IngestStats.Stats(100, 100, 100, 100)) | ||
ProjectId.DEFAULT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use randomProjectIdOrDefault()
to make unit tests a little more generic to MP
This makes sense, but it does feel like a little different behaviour from the other 3 cases. I will probably confuse myself on this later, is this behaviour documented somewhere in the code base? |
This affects the following APIs:
GET _nodes/stats
:indices
, it now prefixes the index name with the project ID (in multi-project clusters). Previously, it didn't tell you which project an index was in, and it failed if two projects had the same index name.ingest
, it now gets the pipeline and processor stats for all projects, and prefixes the pipeline ID with the project ID (in multi-project clusters). Previously, it only got them for the default project.GET /_cluster/stats
:ingest
, it now aggregates the pipeline and processor stats for all projects. Previously, it only got them for the default project.GET /_info
:ingest
, same as forGET /_nodes/stats
.It does not add multi-project support for the
repositories
section on_nodes/stats
, since repositories aren't MP-ready yet AFAIK.This is done by making
IndicesService.stats()
andIngestService.stats()
include project IDs in theNodeIndicesStats
andIngestStats
objects they return, and making those stats objects incorporate the project IDs when converting to XContent.The transitive callers of these two methods are rather extensive (including all callers to
NodeService.stats()
, all callers ofTransportNodesStatsAction
, and so on). To ensure the change is safe, the callers were all checked out, and they fall into the following cases:indices
noringest
stats were requested.indices
and/oringest
stats but only using aggregate values.MachineLearningUsageTransportAction
andTransportGetTrainedModelsStatsAction
, theIngestStats
returned will return stats from all projects instead of just the default with this change, but they have been changed to filter the non-default project stats out, so this change is a noop there. (These actions are not MP-ready yet.)MonitoringService
will be affected, but this is the legacy monitoring module which is not in use anywhere that MP is going to be enabled. (If anything, the behaviour is probably improved by this change, as it will now include project IDs, rather than producing ambiguous unqualified results and failing in the case of duplicates.)