Skip to content

Fetch Kafka Connect, MM2, Bridge trusted certificates Secret only once for each reference#12241

Open
venkatesh2090 wants to merge 15 commits into
strimzi:mainfrom
venkatesh2090:fix-trusted-certs
Open

Fetch Kafka Connect, MM2, Bridge trusted certificates Secret only once for each reference#12241
venkatesh2090 wants to merge 15 commits into
strimzi:mainfrom
venkatesh2090:fix-trusted-certs

Conversation

@venkatesh2090
Copy link
Copy Markdown
Contributor

Type of change

Select the type of your PR

Enhancement / new feature

Description

Fixes #11972

Updated when Secrets for trusted certificates are fetched and used in KafkaConnect, KafkaMirrorMaker2, and KafkaBridge assembly operators. It was fetched twice - once when the combined CA cert secret is generated, and again when the hash is calculated with certs and auth.

Refactored the reconcile loop such that it is only fetched once per reference.

KafkaConnect
It is fetched in tlsTrustedCertsSecret and returned with the method, which is passed into the generateAuthHash method.

KafkaBridge
A similar pattern is used here - but since a new secret isn't generated, it is fetched directly in the loop and passed to the hash method.

KafkaMirrorMaker2
Since there are multiple certs for each mirror's target and source, all the certs are fetched and stored in a HashMap. It relies on the fact that each cluster in MM2 has a unique alias. It is then passed to both secret generation and auth hash methods when needed.

I have also removed 2 tests in ReconcilerUtilsTest which checked if the authTlsHash method fetched the secrets correctly. That method doesn't fetch trusted cert secrets anymore. It expects certs to be passed in as parameter.

Disclosure: The operator tests were generated using AI and modified. I have reviewed it thoroughly before committing.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 13, 2025

Codecov Report

❌ Patch coverage is 92.30769% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.07%. Comparing base (1e0bbfd) to head (8ae6cd9).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...or/assembly/KafkaMirrorMaker2AssemblyOperator.java 90.00% 1 Missing and 1 partial ⚠️
...tor/cluster/operator/assembly/ReconcilerUtils.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12241      +/-   ##
============================================
+ Coverage     75.01%   75.07%   +0.05%     
- Complexity     6397     6405       +8     
============================================
  Files           345      345              
  Lines         24155    24166      +11     
  Branches       3095     3093       -2     
============================================
+ Hits          18120    18142      +22     
+ Misses         4800     4791       -9     
+ Partials       1235     1233       -2     
Files with missing lines Coverage Δ
...ter/operator/assembly/AbstractConnectOperator.java 87.89% <100.00%> (+0.59%) ⬆️
...operator/assembly/KafkaBridgeAssemblyOperator.java 89.41% <100.00%> (+9.17%) ⬆️
...perator/assembly/KafkaConnectAssemblyOperator.java 86.93% <100.00%> (-0.08%) ⬇️
...tor/cluster/operator/assembly/ReconcilerUtils.java 81.46% <50.00%> (+0.24%) ⬆️
...or/assembly/KafkaMirrorMaker2AssemblyOperator.java 85.35% <90.00%> (+0.65%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@scholzj scholzj added this to the 0.50.0 milestone Dec 15, 2025
@scholzj
Copy link
Copy Markdown
Member

scholzj commented Dec 15, 2025

/gha run pipeline=upgrade,regression

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 15, 2025

⏳ System test verification started: link

The following 10 job(s) will be executed:

  • regression-brokers-and-security-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operators-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-operands-amd64 (oracle-vm-8cpu-32gb-x86-64)
  • regression-brokers-and-security-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operators-arm64 (oracle-vm-8cpu-32gb-arm64)
  • regression-operands-arm64 (oracle-vm-8cpu-32gb-arm64)
  • upgrade-azp_kraft_upgrade-amd64 (oracle-vm-4cpu-16gb-x86-64)
  • upgrade-azp_kafka_upgrade-amd64 (oracle-vm-4cpu-16gb-x86-64)
  • upgrade-azp_kraft_upgrade-arm64 (oracle-vm-4cpu-16gb-arm64)
  • upgrade-azp_kafka_upgrade-arm64 (oracle-vm-4cpu-16gb-arm64)

Tests will start after successful build completion.

Copy link
Copy Markdown
Contributor

@tinaselenge tinaselenge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I left some comments. I think we need to leave KafkaBridge related changes out of this PR as mentioned below.

* @return Future which completes when the reconciliation is done
*/
protected Future<Void> tlsTrustedCertsSecret(Reconciliation reconciliation, String namespace, KafkaConnectCluster connect) {
protected Future<List<String>> tlsTrustedCertsSecret(Reconciliation reconciliation, String namespace, KafkaConnectCluster connect) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java doc for @return needs updating

return Future.succeededFuture();
}
});
.compose(certificates -> tlsTrustedCertsSecret(reconciliation, namespace, connect, certificates));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this to be broken down to another method? The only changes here seem to be encoding a list of certificates instead of String and then mapping the result with the list instead of empty.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

})
.compose(i -> isPodDisruptionBudgetGeneration ? podDisruptionBudgetOperator.reconcile(reconciliation, namespace, bridge.getComponentName(), bridge.generatePodDisruptionBudget()) : Future.succeededFuture())
.compose(i -> ReconcilerUtils.authTlsHash(secretOperations, namespace, auth, trustedCertificates))
.compose(i -> ReconcilerUtils.trustedCertificates(reconciliation, secretOperations, trustedCertificates))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KafkaBridge resource currently doesn't use this internal secret so I don't think we should create one yet . The primary reason for this secret is to configure truststore directly from a secret instead of using generated certificates based on volume mounted secrets. It should be done in a separate PR like in #11549 which is closed for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll undo KafkaBridge changes as mentioned in the parent comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually reverting KafkaBridge changes is not straightforward because I changed authTlsHash. I have changed the function parameters to require a list of certificates which I fetch here.

KafkaConnectBuild build;
KafkaConnectStatus kafkaConnectStatus = new KafkaConnectStatus();
List<String> tlsCertificates;
List<String> oauthCertificates;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are they for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committed by mistake, will remove

.compose(i -> serviceOperations.reconcile(reconciliation, namespace, mirrorMaker2Cluster.getServiceName(), mirrorMaker2Cluster.generateService()))
.compose(i -> serviceOperations.reconcile(reconciliation, namespace, mirrorMaker2Cluster.getComponentName(), mirrorMaker2Cluster.generateHeadlessService()))
.compose(i -> tlsTrustedCertsSecret(reconciliation, namespace, mirrorMaker2Cluster))
.compose(i -> updateMM2ClusterCertificateMap(reconciliation, mirrorMaker2Cluster, clusterCerts))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand these changes? Can you please provide some explanation?

Copy link
Copy Markdown
Contributor Author

@venkatesh2090 venkatesh2090 Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updateMM2ClusterCertificateMap updates the clusterCerts map that's in the scope of a single reconcilation. The key is the alias of the cluster as specified in the CR and the value is a list of certs required for that cluster.

I fetch all the clusters beforehand and use them in the next stage.

I use the target cluster's alias kafkaMirrorMaker2.getSpec().getTarget().getAlias() to get the target cluster's CA cert and use that to generate the connect's internal tls secret. This just replicates the behaviour that already existed.

The original tlsTrustedCertsSecret gets certificate using connect.getTls().getTrustedCertificates() and fetches it. I didn't want it to fetch again in this context because it is already fetched in the updateMM2ClusterCertificateMap method. So I split tlsTrustedCertsSecret to be able to pass in the list of certificates from the map. The connect.getTls() method resolves to spec.target.tls from KafkaMirrorMaker2Cluster.fromCrd and eventually KafkaMirrorMaker2Cluster.buildKafkaConnectSpec usingspec.getTarget().getTls(). That's why I'm using the target's alias for tlsTrustedCertsSecret. The alias remains unique at this stage because KafkaMirrorMaker2Cluster.validateAndUpdateToNewAPI has a check for it

Later on, the map is used in generateAuthHash to generate the hash as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explaining, I understand it now. So if we are getting the target cluster's TLS config to reconcile the internal truststore secret, why do we need this map with source alias and its certs? Source cluster certs seem to be used for generating the auth hash but we previously just used target's clusters' certs to generate the hash. Source cluster's certs would be only used when configuring Connectors, rather the Connect cluster (which is the target cluster) but this part is not implemented yet. I guess for that we would need to both clusters certs but not sure, if we need to do it for this PR's purpose.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I implemented the auth hash method incorrectly. I meant to use the map like

clusterCert.get(cluster.getAlias())

and then pass it to ReconcilerUtils.authTlsHash. I meant to replicate this behaviour https://github.com/strimzi/strimzi-kafka-operator/pull/12241/files/63d40adaf074ca0d50be8e7b4bc294a58f8b3d73#diff-c6d9b371d64b172f4e174d61f4a6b81ed129010c41a016f0bbf29dddc62df009L204-L207

                        .map(cluster -> {
                            List<CertSecretSource> trustedCertificates = cluster.getTls() == null ? List.of() : cluster.getTls().getTrustedCertificates();
                            return ReconcilerUtils.authTlsHash(secretOperations, namespace, cluster.getAuthentication(), trustedCertificates);
                        }).collect(Collectors.toList())
                )

by doing this instead

-    private Future<Integer> generateAuthHash(String namespace, KafkaMirrorMaker2Cluster mirrorMaker2Cluster) {
+    private Future<Integer> generateAuthHash(String namespace, KafkaMirrorMaker2Cluster mirrorMaker2Cluster, Map<String, List<String>> clusterCert) {
         Promise<Integer> authHash = Promise.promise();
 
         Future.join(mirrorMaker2Cluster
                         .clusters()
                         .stream()
                         .map(cluster -> {
-                            List<CertSecretSource> trustedCertificates = cluster.getTls() == null ? List.of() : cluster.getTls().getTrustedCertificates();
-                            return ReconcilerUtils.authTlsHash(secretOperations, namespace, cluster.getAuthentication(), trustedCertificates);
-                        }).collect(Collectors.toList())
+                            KafkaClientAuthentication auth = cluster.getAuthentication();
+                            List<String> certificates = clusterCert.getOrDefault(cluster.getAlias(), , Collections.emptyList());
+                            return ReconcilerUtils.authTlsHash(secretOperations, namespace, auth, certificates);
+                        })
+                        .collect(Collectors.toList())
                 )

but we previously just used target's clusters' certs to generate the hash

I'm not sure about this part. Didn't it use all the clusters - source and target, to generate the auth hash via the KafkaMirrorMaker2Cluster.clusters() method

private static List<KafkaMirrorMaker2ClusterSpec> clusters(KafkaMirrorMaker2 kafkaMirrorMaker2) {
// The resource is already converted to the new API, so we do not need to check both APIs
List<KafkaMirrorMaker2ClusterSpec> clusters = new ArrayList<>();
// We add the target cluster
clusters.add(kafkaMirrorMaker2.getSpec().getTarget());
clusters.addAll(kafkaMirrorMaker2.getSpec().getMirrors().stream().map(KafkaMirrorMaker2MirrorSpec::getSource).toList());
return clusters;
}
? Although I didn't implement that part correctly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this part. Didn't it use all the clusters - source and target, to generate the auth hash via the KafkaMirrorMaker2Cluster.clusters() method

You are right, I misunderstood it. We do need both clusters to generate the auth hash.

* @return Certificates extracted from the Secrets
*/
public static Future<String> trustedCertificates(Reconciliation reconciliation, SecretOperator secretOperations, List<CertSecretSource> certificateSources) {
public static Future<List<String>> trustedCertificates(Reconciliation reconciliation, SecretOperator secretOperations, List<CertSecretSource> certificateSources) {
Copy link
Copy Markdown
Contributor

@tinaselenge tinaselenge Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to be changed to a List? Can the hash not be computed from the final String instead of a List?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the auth hash method sums the hashes of each cert, but the trustedCertificates method returns the string list concatenated with "\n" as the delimiter. I believe it isn't possible to split it after it is concatenated. It would also not produce the same hash as before I assume if I just use hashCode of the concatenated string.

But if it is ok to use the hashCode of the concatenated string, it makes the implementation much simpler. I don't mind changing it if that is better in your opinion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't bring any clear advantage I would leave the previous implementation. Unless as I said you need the List of certificates somewhere else.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I will refactor to make this simpler. The only place this list of certs is used is for the auth hash calculation.

@github-actions
Copy link
Copy Markdown

🎉 System test verification passed: link

@scholzj scholzj modified the milestones: 0.50.0, 0.51.0 Jan 14, 2026
@scholzj scholzj modified the milestones: 0.51.0, 0.52.0 Feb 24, 2026
@venkatesh2090 venkatesh2090 force-pushed the fix-trusted-certs branch 2 times, most recently from bb4973a to edce070 Compare March 14, 2026 00:09
@venkatesh2090
Copy link
Copy Markdown
Contributor Author

@tinaselenge @ppatierno I have done some refactoring and rebased against latest main. Would appreciate another review

@scholzj scholzj removed this from the 1.0.0 milestone Apr 18, 2026
@tinaselenge
Copy link
Copy Markdown
Contributor

Thanks @venkatesh2090 . Just to let you know that there are changes that will probably conflict with your PR now:
#12735
#12728

Once both of these are merged, you might want to rebase and see if these changes would simplify your changes as truststore certificate handling will become more consistent across Bridge, Connect and MM2.

@katheris
Copy link
Copy Markdown
Member

Discussed on community call on 14.5.2026: @venkatesh2090 apologies for us failing to get back to this. As you have likely seen there's a bunch of changes being made in this area by myself and @tinaselenge. If you are still up for continuing on this task then let us know and Tina can help you to rebase your changes to make sense with the changes she's making. If you don't have time for this task anymore no worries, just let us know and Tina has agreed to take your PR forwards and get it merged.

Sorry again for us dropping the ball here.

@venkatesh2090
Copy link
Copy Markdown
Contributor Author

No problem. I haven't had the time to look into this recently either. I'll try to look over this weekend. If I don't get back in a week, feel free to take over.

Refactored tlsTrustedCertsSecret to return a list of secrets instead of the concatenated version.

Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Also fixed auth hash in MM2.

Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
…ted certificates.

This simplifies authTlsHash implementation by using hashCode on the certBundle instead of adding all hashes.

Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented May 25, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
Signed-off-by: Venkatesh Kannan <venkatesprasad512@gmail.com>
@venkatesh2090
Copy link
Copy Markdown
Contributor Author

@tinaselenge I have rebased onto main with your changes now. I didn't have many conflicts. The only one I had was in Bridge Assembly Operator, where I reused one of the newer methods.

Would appreciate another review.

@katheris
Copy link
Copy Markdown
Member

The unit test failed in a UserOperator test so I have restarted the job to see if it is an intermittent failure

@katheris katheris added this to the 1.1.0 milestone May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid getting the trusted certificates Secrets twice -> once for the certs and once for hash

5 participants