-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the Strimzi Metrics Reporter to Kafka brokers/controllers components #11051
base: main
Are you sure you want to change the base?
Add support for the Strimzi Metrics Reporter to Kafka brokers/controllers components #11051
Conversation
e569be2
to
1180ee2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments after an initial pass. You should also update the names / description / CHANGELOG to make it clear what this really does. Kafka compoenents are brokers/controllers, Connect and MM2. You add this only for brokers/controllers. It should be clear from the CHANGELOG and PR name / desc.
packaging/examples/metrics/grafana-dashboards/strimzi-kafka.json
Outdated
Show resolved
Hide resolved
packaging/examples/metrics/grafana-dashboards/strimzi-kraft.json
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/kafka/KafkaClusterSpec.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/ConfigMapUtils.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/CruiseControl.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaCluster.java
Outdated
Show resolved
Hide resolved
...perator/src/main/java/io/strimzi/operator/cluster/model/KafkaBrokerConfigurationBuilder.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaCluster.java
Outdated
Show resolved
Hide resolved
1180ee2
to
f943007
Compare
@scholzj I am currently working on the changes I did not comment on yet. |
api/src/main/java/io/strimzi/api/kafka/model/common/metrics/StrimziMetricsReporterValues.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/common/metrics/JmxPrometheusExporterMetrics.java
Show resolved
Hide resolved
84fb016
to
8f5b28f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some more comments. There also seem to be many unaddressed older comments. Not sure if that was intentional or not at this point.
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/CruiseControl.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/CruiseControl.java
Outdated
Show resolved
Hide resolved
...tor/src/main/java/io/strimzi/operator/cluster/model/metrics/StrimziMetricsReporterModel.java
Outdated
Show resolved
Hide resolved
...ator/src/main/java/io/strimzi/operator/cluster/model/metrics/JmxPrometheusExporterModel.java
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/metrics/MetricsModel.java
Show resolved
Hide resolved
@scholzj I'm supporting Owen with this feature. He is working on and will eventually address all comments. |
@scholzj I know progress is going quite slowly. I will address the previous comments shortly. I'm currently working on the CEL validation. When that is committed, I'll go back and tackle the older comments and work through them. |
No worries Owen, I was just unsure what the expected state is. I did not intend to put any pressure on your. Sorry. |
ece3b1f
to
75a829f
Compare
The build failed with the following tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @OwenCorrigan76, we still need to do some changes and address the remaining comments, but it works fine.
For the KafkaCrdIT.testKafkaWithNullMaintenance
failure, I think we simply need to update the expected error messages removing the invalid:
prefix. Now, you get invalid: [
, rather than invalid:
. With CEL validation rules, we get the additional error message "validation rules were not checked".
api/src/main/java/io/strimzi/api/kafka/model/kafka/KafkaClusterSpec.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/common/metrics/JmxPrometheusExporterMetrics.java
Outdated
Show resolved
Hide resolved
api/src/main/java/io/strimzi/api/kafka/model/common/metrics/StrimziMetricsReporterValues.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/CruiseControl.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/CruiseControl.java
Outdated
Show resolved
Hide resolved
...perator/src/main/java/io/strimzi/operator/cluster/model/KafkaBrokerConfigurationBuilder.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaConnectCluster.java
Show resolved
Hide resolved
packaging/examples/metrics/strimzi-metrics-reporter/kafka-metrics.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OwenCorrigan76 I was wrong on the "Unsupported metrics type", please correct.
api/src/main/java/io/strimzi/api/kafka/model/common/metrics/JmxPrometheusExporterMetrics.java
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaCluster.java
Outdated
Show resolved
Hide resolved
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaConnectCluster.java
Outdated
Show resolved
Hide resolved
...perator/src/main/java/io/strimzi/operator/cluster/model/KafkaBrokerConfigurationBuilder.java
Outdated
Show resolved
Hide resolved
4304607
to
f88ec2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looks good to me for now, but we still have some comments to address.
3c94f43
to
3969049
Compare
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: Federico Valeri <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: Federico Valeri <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: Federico Valeri <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: Federico Valeri <[email protected]> Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
b6ea886
to
7e16090
Compare
… listener.enabed=true Signed-off-by: OwenCorrigan76 <[email protected]>
e7e8dc5
to
b666550
Compare
/azp run regression |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tiny nit, but otherwise this looks good to me
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaCluster.java
Outdated
Show resolved
Hide resolved
@scholzj I see you have a requested change on this PR, has it been addressed? |
…irectory Signed-off-by: OwenCorrigan76 <[email protected]>
private List<String> allowList; | ||
private Map<String, Object> additionalProperties; | ||
|
||
@Description("A comma separated list of regex patterns to specify the metrics to collect.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a Java / YAML list ... so the description is probably wrong? (I.t. it is not a String with comma separated list
)
if (model instanceof SupportsMetrics supportsMetrics && supportsMetrics.metrics() instanceof JmxPrometheusExporterModel) { | ||
String parseResult = ((JmxPrometheusExporterModel) supportsMetrics.metrics()).metricsJson(reconciliation, metricsAndLogging.metricsCm()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you do something like this:
if (model instanceof SupportsMetrics supportsMetrics && supportsMetrics.metrics() instanceof JmxPrometheusExporterModel) { | |
String parseResult = ((JmxPrometheusExporterModel) supportsMetrics.metrics()).metricsJson(reconciliation, metricsAndLogging.metricsCm()); | |
if (model instanceof SupportsMetrics supportsMetrics && supportsMetrics.metrics() instanceof JmxPrometheusExporterModel jmxMetrics) { | |
String parseResult = jmxMetrics.metricsJson(reconciliation, metricsAndLogging.metricsCm()); |
@@ -105,7 +109,7 @@ public class CruiseControl extends AbstractModel implements SupportsMetrics, Sup | |||
|
|||
// Configuration defaults | |||
protected static final boolean DEFAULT_CRUISE_CONTROL_METRICS_ENABLED = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be now DEFAULT_CRUISE_CONTROL_JMX_EXPORTER_METRICS_ENABLED
? You renamed one variable but not the other. Might make sense to keep them consistent, or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on a second look ... this seems like something to be removed. It is used only in the test as the default value there. But it is not the default value anymore. So I guess it should be deleted and if needed the test can just use false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this is not needed anymore and can be removed.
createConfigIfMissing(userConfig, "config.providers.strimzienv.class", "org.apache.kafka.common.config.provider.EnvVarConfigProvider"); | ||
createConfigIfMissing(userConfig, "config.providers.strimzienv.param.allowlist.pattern", ".*"); | ||
|
||
if (node.broker()) { | ||
// File and Directory providers are used only on broker nodes | ||
writer.println("config.providers.strimzifile.class=org.apache.kafka.common.config.provider.FileConfigProvider"); | ||
writer.println("config.providers.strimzifile.param.allowed.paths=/opt/kafka"); | ||
writer.println("config.providers.strimzidir.class=org.apache.kafka.common.config.provider.DirectoryConfigProvider"); | ||
writer.println("config.providers.strimzidir.param.allowed.paths=/opt/kafka"); | ||
createConfigIfMissing(userConfig, "config.providers.strimzifile.class", "org.apache.kafka.common.config.provider.FileConfigProvider"); | ||
createConfigIfMissing(userConfig, "config.providers.strimzifile.param.allowed.paths", "/opt/kafka"); | ||
createConfigIfMissing(userConfig, "config.providers.strimzidir.class", "org.apache.kafka.common.config.provider.DirectoryConfigProvider"); | ||
createConfigIfMissing(userConfig, "config.providers.strimzidir.param.allowed.paths", "/opt/kafka"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The options we use HAVE TO be configured our way. So these cannot be set if they are missing. If you are afraid that users would decide to use the providers with the same names as we use them, that is a far point, but you need to throw an InvalidConfiguration exception in such a case. (Or for example overwrite them with our values - but not keep the original values from the user)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a refactoring, the behavior of configProviders doesn't change. The intention is to avoid code duplication, as this config manipulation logic is reused by the new methods.
Initially we went with the override approach, but after further inspection, we found that the original code wasn't producing any override when the user configuration had the same config providers' aliases. This is because the original user config is written last in the Kafka configuration file and wins over the Strimzi one.
Example:
$ kubectl get k my-cluster -o yaml | yq '.spec.kafka.config'
config.providers: strimzienv
config.providers.strimzienv.class: org.apache.kafka.common.config.provider.EnvVarConfigProvider
config.providers.strimzienv.param.allowlist.pattern: test.*
$ kubectl exec my-cluster-broker-0 -- cat /tmp/strimzi.properties
##########
# Config providers
##########
# Configuration providers configured by the user and by Strimzi
config.providers=strimzienv,strimzienv,strimzifile,strimzidir
config.providers.strimzienv.class=org.apache.kafka.common.config.provider.EnvVarConfigProvider
config.providers.strimzienv.param.allowlist.pattern=.*
config.providers.strimzifile.class=org.apache.kafka.common.config.provider.FileConfigProvider
config.providers.strimzifile.param.allowed.paths=/opt/kafka
config.providers.strimzidir.class=org.apache.kafka.common.config.provider.DirectoryConfigProvider
config.providers.strimzidir.param.allowed.paths=/opt/kafka
##########
# User provided configuration
##########
config.providers.strimzienv.class=org.apache.kafka.common.config.provider.EnvVarConfigProvider
config.providers.strimzienv.param.allowlist.pattern=test.*
If you are afraid that users would decide to use the providers with the same names as we use them, that is a far point, but you need to throw an InvalidConfiguration exception in such a case.
I personally agree with that, but maybe we can track this issue and address with a dedicated PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally agree with that, but maybe we can track this issue and address with a dedicated PR.
Sorry, but no. This PR decided to touch it without any obvious reason to do so. So it should do it properly or not at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OwenCorrigan76 please revert the configProviders
refactoring. After that, we also don't need createConfigIfMissing
and related tests. I opened #11349 to track the issue reported above.
|
||
// print user config with Strimzi injections | ||
if (!userConfig.getConfiguration().isBlank()) { | ||
printSectionHeader("User provided configuration with Strimzi injections"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was about the message in the config. Not about the condition.
} | ||
|
||
@ParallelTest | ||
public void testStrimziMetricsReporterNetworkPolicy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be simply part of the previous test? It seems you have the exact duplicate an just have different asserts.
|
||
For more information about setting up and deploying Prometheus and Grafana, see link:{BookURLDeploying}#assembly-metrics-str[Introducing Metrics to Kafka^]. | ||
NOTE: Using `strimziMetricsReporter` is only supported in the Kafka component at the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should explain what Kafka component is ...
NOTE: Using `strimziMetricsReporter` is only supported in the Kafka component at the moment. | |
NOTE: Using `strimziMetricsReporter` is only supported in the Kafka brokers and controllers at the moment. |
When metrics are enabled, they are exposed on port 9404. | ||
.Using Strimzi Metrics Reporter | ||
The `metricsConfig` property contains configurations for the {StrimziMetricsReporter}. | ||
When configured to use Strimzi Metrics Reporter, Strimzi exposes the metrics provided by Apache Kafka directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this means anything ... what exactly is direct here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather say something like:
Using the Strimzi Metrics Reporter, Kafka JMX reporter and Prometheus JMX Exporter Java agent are not required to expose metrics in Prometheus format via HTTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for user documentation, that is meaningless. If you want to explain the difference, you should focus on something that explains the value for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new attempt: The Strimzi Metrics Reporter offers a lightweight solution for exposing Kafka metrics in Prometheus format, eliminating the need for intermediate layers like JMX or Java agents, and avoiding complex mapping rules that can introduce latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about leaving out the eliminating the need for intermediate layers like JMX or Java agents
part? But yes, I think this looks reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just makes things confusing ... as long as the Strimzi Metrics reporter is not ready for its prime time, you should keep the JMX Prometheus stuff as the primary and keep the Strimzi Metrics exporter in a subfolder. If you keep both in subfolder you just confuse everyone to not know what to use! Please revert all these changes!
public static KafkaBuilder kafkaWithStrimziMetricsReporter(String namespaceName, String kafkaClusterName, int kafkaReplicas) { | ||
KafkaBuilder kafkaBuilder = kafka(namespaceName, kafkaClusterName, kafkaReplicas) | ||
.editSpec() | ||
.editKafka() | ||
.withNewStrimziMetricsReporterConfig() | ||
.withNewValues() | ||
.withAllowList("kafka_server.*") | ||
.endValues() | ||
.endStrimziMetricsReporterConfig() | ||
.endKafka() | ||
.endSpec(); | ||
|
||
return kafkaBuilder; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more question on @strimzi/system-test-contributors ... but why do we need this method? It is called from one place and does not add much value. It can be easily inlined and make the readability of the whole test much better because you would see right away what the actual configuration is.
Type of change
Description
This patch adds support for the Strimzi Metrics Reporter to Kafka brokers/controllers components as described by the following proposal:
https://github.com/strimzi/proposals/blob/main/064-prometheus-metrics-reporter.md
Related to #10753
Support for Kafka Connect and MirrorMaker2 and Strimzi Kafka Bridge will be added in subsequent PRs.
CEL Validation has been added to the MetricsConfig and CruiseControl API in this PR.
CEL Validation was introduced to Strimzi in this PR:
#11068
We won’t initially support the CruiseControl component. To make it work, CC should be changed to expose metrics through its HTTP endpoint.
Checklist