Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the Strimzi Metrics Reporter to Kafka brokers/controllers components #11051

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

OwenCorrigan76
Copy link
Contributor

@OwenCorrigan76 OwenCorrigan76 commented Jan 16, 2025

Type of change

  • Enhancement / new feature

Description

This patch adds support for the Strimzi Metrics Reporter to Kafka brokers/controllers components as described by the following proposal:

https://github.com/strimzi/proposals/blob/main/064-prometheus-metrics-reporter.md

Related to #10753

Support for Kafka Connect and MirrorMaker2 and Strimzi Kafka Bridge will be added in subsequent PRs.

CEL Validation has been added to the MetricsConfig and CruiseControl API in this PR.
CEL Validation was introduced to Strimzi in this PR:
#11068

We won’t initially support the CruiseControl component. To make it work, CC should be changed to expose metrics through its HTTP endpoint.

Checklist

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md

@OwenCorrigan76 OwenCorrigan76 requested review from a team and mimaison January 16, 2025 17:15
@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 2 times, most recently from e569be2 to 1180ee2 Compare January 16, 2025 17:23
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments after an initial pass. You should also update the names / description / CHANGELOG to make it clear what this really does. Kafka compoenents are brokers/controllers, Connect and MM2. You add this only for brokers/controllers. It should be clear from the CHANGELOG and PR name / desc.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from 1180ee2 to f943007 Compare January 17, 2025 12:00
@OwenCorrigan76
Copy link
Contributor Author

@scholzj I am currently working on the changes I did not comment on yet.

@fvaleri fvaleri added this to the 0.46.0 milestone Jan 17, 2025
@OwenCorrigan76 OwenCorrigan76 changed the title Add support for the Strimzi Metrics Reporter to Kafka component Add support for the Strimzi Metrics Reporter to Kafka brokers/controllers components Jan 17, 2025
@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 2 times, most recently from 84fb016 to 8f5b28f Compare February 13, 2025 11:16
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some more comments. There also seem to be many unaddressed older comments. Not sure if that was intentional or not at this point.

@fvaleri
Copy link
Contributor

fvaleri commented Feb 17, 2025

There also seem to be many unaddressed older comments. Not sure if that was intentional or not at this point.

@scholzj I'm supporting Owen with this feature. He is working on and will eventually address all comments.

@OwenCorrigan76
Copy link
Contributor Author

@scholzj I know progress is going quite slowly. I will address the previous comments shortly. I'm currently working on the CEL validation. When that is committed, I'll go back and tackle the older comments and work through them.

@scholzj
Copy link
Member

scholzj commented Feb 17, 2025

@scholzj I know progress is going quite slowly. I will address the previous comments shortly. I'm currently working on the CEL validation. When that is committed, I'll go back and tackle the older comments and work through them.

No worries Owen, I was just unsure what the expected state is. I did not intend to put any pressure on your. Sorry.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 3 times, most recently from ece3b1f to 75a829f Compare February 24, 2025 13:23
@OwenCorrigan76
Copy link
Contributor Author

The build failed with the following tests:
io.strimzi.api.kafka.model.kafka.KafkaCrdIT.testKafkaWithNullMaintenance
Currently investigating why.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OwenCorrigan76, we still need to do some changes and address the remaining comments, but it works fine.

For the KafkaCrdIT.testKafkaWithNullMaintenance failure, I think we simply need to update the expected error messages removing the invalid: prefix. Now, you get invalid: [, rather than invalid: . With CEL validation rules, we get the additional error message "validation rules were not checked".

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OwenCorrigan76 I was wrong on the "Unsupported metrics type", please correct.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from 4304607 to f88ec2c Compare February 26, 2025 10:51
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good to me for now, but we still have some comments to address.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 2 times, most recently from 3c94f43 to 3969049 Compare March 4, 2025 12:21
ocorriga and others added 21 commits April 10, 2025 10:31
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: Federico Valeri <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: Federico Valeri <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: Federico Valeri <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
Signed-off-by: Federico Valeri <[email protected]>
Signed-off-by: OwenCorrigan76 <[email protected]>
@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from b6ea886 to 7e16090 Compare April 10, 2025 09:32
@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from e7e8dc5 to b666550 Compare April 10, 2025 10:40
@katheris
Copy link
Contributor

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@katheris katheris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny nit, but otherwise this looks good to me

@katheris
Copy link
Contributor

@scholzj I see you have a requested change on this PR, has it been addressed?

private List<String> allowList;
private Map<String, Object> additionalProperties;

@Description("A comma separated list of regex patterns to specify the metrics to collect.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a Java / YAML list ... so the description is probably wrong? (I.t. it is not a String with comma separated list)

Comment on lines +70 to +71
if (model instanceof SupportsMetrics supportsMetrics && supportsMetrics.metrics() instanceof JmxPrometheusExporterModel) {
String parseResult = ((JmxPrometheusExporterModel) supportsMetrics.metrics()).metricsJson(reconciliation, metricsAndLogging.metricsCm());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you do something like this:

Suggested change
if (model instanceof SupportsMetrics supportsMetrics && supportsMetrics.metrics() instanceof JmxPrometheusExporterModel) {
String parseResult = ((JmxPrometheusExporterModel) supportsMetrics.metrics()).metricsJson(reconciliation, metricsAndLogging.metricsCm());
if (model instanceof SupportsMetrics supportsMetrics && supportsMetrics.metrics() instanceof JmxPrometheusExporterModel jmxMetrics) {
String parseResult = jmxMetrics.metricsJson(reconciliation, metricsAndLogging.metricsCm());

@@ -105,7 +109,7 @@ public class CruiseControl extends AbstractModel implements SupportsMetrics, Sup

// Configuration defaults
protected static final boolean DEFAULT_CRUISE_CONTROL_METRICS_ENABLED = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be now DEFAULT_CRUISE_CONTROL_JMX_EXPORTER_METRICS_ENABLED? You renamed one variable but not the other. Might make sense to keep them consistent, or?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on a second look ... this seems like something to be removed. It is used only in the test as the default value there. But it is not the default value anymore. So I guess it should be deleted and if needed the test can just use false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this is not needed anymore and can be removed.

Comment on lines +780 to +788
createConfigIfMissing(userConfig, "config.providers.strimzienv.class", "org.apache.kafka.common.config.provider.EnvVarConfigProvider");
createConfigIfMissing(userConfig, "config.providers.strimzienv.param.allowlist.pattern", ".*");

if (node.broker()) {
// File and Directory providers are used only on broker nodes
writer.println("config.providers.strimzifile.class=org.apache.kafka.common.config.provider.FileConfigProvider");
writer.println("config.providers.strimzifile.param.allowed.paths=/opt/kafka");
writer.println("config.providers.strimzidir.class=org.apache.kafka.common.config.provider.DirectoryConfigProvider");
writer.println("config.providers.strimzidir.param.allowed.paths=/opt/kafka");
createConfigIfMissing(userConfig, "config.providers.strimzifile.class", "org.apache.kafka.common.config.provider.FileConfigProvider");
createConfigIfMissing(userConfig, "config.providers.strimzifile.param.allowed.paths", "/opt/kafka");
createConfigIfMissing(userConfig, "config.providers.strimzidir.class", "org.apache.kafka.common.config.provider.DirectoryConfigProvider");
createConfigIfMissing(userConfig, "config.providers.strimzidir.param.allowed.paths", "/opt/kafka");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options we use HAVE TO be configured our way. So these cannot be set if they are missing. If you are afraid that users would decide to use the providers with the same names as we use them, that is a far point, but you need to throw an InvalidConfiguration exception in such a case. (Or for example overwrite them with our values - but not keep the original values from the user)

Copy link
Contributor

@fvaleri fvaleri Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a refactoring, the behavior of configProviders doesn't change. The intention is to avoid code duplication, as this config manipulation logic is reused by the new methods.

Initially we went with the override approach, but after further inspection, we found that the original code wasn't producing any override when the user configuration had the same config providers' aliases. This is because the original user config is written last in the Kafka configuration file and wins over the Strimzi one.

Example:

$ kubectl get k my-cluster -o yaml | yq '.spec.kafka.config'
config.providers: strimzienv
config.providers.strimzienv.class: org.apache.kafka.common.config.provider.EnvVarConfigProvider
config.providers.strimzienv.param.allowlist.pattern: test.*

$ kubectl exec my-cluster-broker-0 -- cat /tmp/strimzi.properties
##########
# Config providers
##########
# Configuration providers configured by the user and by Strimzi
config.providers=strimzienv,strimzienv,strimzifile,strimzidir
config.providers.strimzienv.class=org.apache.kafka.common.config.provider.EnvVarConfigProvider
config.providers.strimzienv.param.allowlist.pattern=.*
config.providers.strimzifile.class=org.apache.kafka.common.config.provider.FileConfigProvider
config.providers.strimzifile.param.allowed.paths=/opt/kafka
config.providers.strimzidir.class=org.apache.kafka.common.config.provider.DirectoryConfigProvider
config.providers.strimzidir.param.allowed.paths=/opt/kafka

##########
# User provided configuration
##########
config.providers.strimzienv.class=org.apache.kafka.common.config.provider.EnvVarConfigProvider
config.providers.strimzienv.param.allowlist.pattern=test.*

If you are afraid that users would decide to use the providers with the same names as we use them, that is a far point, but you need to throw an InvalidConfiguration exception in such a case.

I personally agree with that, but maybe we can track this issue and address with a dedicated PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally agree with that, but maybe we can track this issue and address with a dedicated PR.

Sorry, but no. This PR decided to touch it without any obvious reason to do so. So it should do it properly or not at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OwenCorrigan76 please revert the configProviders refactoring. After that, we also don't need createConfigIfMissing and related tests. I opened #11349 to track the issue reported above.


// print user config with Strimzi injections
if (!userConfig.getConfiguration().isBlank()) {
printSectionHeader("User provided configuration with Strimzi injections");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was about the message in the config. Not about the condition.

}

@ParallelTest
public void testStrimziMetricsReporterNetworkPolicy() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be simply part of the previous test? It seems you have the exact duplicate an just have different asserts.


For more information about setting up and deploying Prometheus and Grafana, see link:{BookURLDeploying}#assembly-metrics-str[Introducing Metrics to Kafka^].
NOTE: Using `strimziMetricsReporter` is only supported in the Kafka component at the moment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should explain what Kafka component is ...

Suggested change
NOTE: Using `strimziMetricsReporter` is only supported in the Kafka component at the moment.
NOTE: Using `strimziMetricsReporter` is only supported in the Kafka brokers and controllers at the moment.

When metrics are enabled, they are exposed on port 9404.
.Using Strimzi Metrics Reporter
The `metricsConfig` property contains configurations for the {StrimziMetricsReporter}.
When configured to use Strimzi Metrics Reporter, Strimzi exposes the metrics provided by Apache Kafka directly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this means anything ... what exactly is direct here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather say something like:

Using the Strimzi Metrics Reporter, Kafka JMX reporter and Prometheus JMX Exporter Java agent are not required to expose metrics in Prometheus format via HTTP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for user documentation, that is meaningless. If you want to explain the difference, you should focus on something that explains the value for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new attempt: The Strimzi Metrics Reporter offers a lightweight solution for exposing Kafka metrics in Prometheus format, eliminating the need for intermediate layers like JMX or Java agents, and avoiding complex mapping rules that can introduce latency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about leaving out the eliminating the need for intermediate layers like JMX or Java agents part? But yes, I think this looks reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just makes things confusing ... as long as the Strimzi Metrics reporter is not ready for its prime time, you should keep the JMX Prometheus stuff as the primary and keep the Strimzi Metrics exporter in a subfolder. If you keep both in subfolder you just confuse everyone to not know what to use! Please revert all these changes!

Comment on lines +57 to +70
public static KafkaBuilder kafkaWithStrimziMetricsReporter(String namespaceName, String kafkaClusterName, int kafkaReplicas) {
KafkaBuilder kafkaBuilder = kafka(namespaceName, kafkaClusterName, kafkaReplicas)
.editSpec()
.editKafka()
.withNewStrimziMetricsReporterConfig()
.withNewValues()
.withAllowList("kafka_server.*")
.endValues()
.endStrimziMetricsReporterConfig()
.endKafka()
.endSpec();

return kafkaBuilder;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more question on @strimzi/system-test-contributors ... but why do we need this method? It is called from one place and does not add much value. It can be easily inlined and make the readability of the whole test much better because you would see right away what the actual configuration is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants