[FLINK-36932][metrics] Added resource-level metrics for different states/statuses #926
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
This PR adds new metrics that help track the current value of different states/statuses at the resource level. In some cases, metrics already exists for some of these statuses/states, but those metrics represent namespace or system-wide counts, as opposed to per-resource gauges that indicate whether or not a deployment/session job is in a particular state.
In other cases, some statuses/states that weren't yet tracked through a dedicated metric (ex: job status) now have a resource-level gauge and namespace-level counter.
Brief change log
Summary of the changes for each state/status:
JobManagerDeploymentStatus
: state gauge added at resource-level (FlinkDeployment only)JobStatus
: status gauge added at resource-level (FlinkDeployment only), status counter at namespace-levelResourceLifecycleState
: state gauge added at resource-levelVerifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
CustomResourceDescriptors
: noDocumentation
N.B. While these changes might not represent a full-on "feature", I'm planning to update the documentation that generates this page. However, I've held off doing this as part of this initial commit in order to settle the naming and implementation. Once this is done, I can update the documentation accordingly.