-
|
We are facing a connectivity issue between Cruise Control and the Kafka brokers when using Strimzi with Istio and Cilium in our Kubernetes cluster. Environment
SymptomCruise Control itself can connect to the cluster correctly using the The problem appears in the Cruise Control Metrics Reporter running inside each broker. Instead of using the same In our environment, the metrics reporter produces continuous INFO/WARN logs like: As a result, the Behaviour in Strimzi codeLooking at The comment in the code explains that the metrics reporter uses the brokers headless service instead of the bootstrap service because the Admin client is not able to connect to pods behind the bootstrap service when they are not ready during startup. However, we are unsure if this behaviour is still beneficial, given that if Cruise Control cannot collect metrics from the brokers reliably, it may lead to operational problems such as the What seems to be happeningBecause we are using KRaft, the From our tests:
So the functional workaround (a brokers-only headless service) is not stable, because reconciliation reverts it. Question / Request
Has anyone seen similar issues connecting Cruise Control Metrics Reporter through One possible workaround we are considering is creating a Kubernetes NetworkPolicy that blocks port 9091 traffic to controller pods. This could prevent Cruise Control Metrics Reporter from attempting connections to controllers on that port, effectively forcing it to only connect to brokers. We have other clusters with Strimzi without Istio and Cilium, and it works well. Any guidance on the “Strimzi‑approved” way to deal with this (or whether this is considered a bug vs. expected behaviour) would be very helpful. Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
|
Please keep in mind that Strimzi does not support Istio. So there is no Strimzi-approved way to handle anything related to it. While I think there are few users running it in some way, the general advice would be to avoid it. I would be quite curious if you can reproduce the problem without Istio. You suggest the service pointing to the controllers is the issue. But I never saw any problems with it in standard Strimzi deployments and I'm not aware of anyone reporting it. So I wonder if we missed it - you did not share a proper log, but the error you shared seems like something ephemeral and easy to recover from by the client - so it might be easy to miss if it does not cause any real issues. Sadly, the PR that changed it from bootstrap to brokers long time ago does not cover the details. But in between the lines it seems the issue was the DNS resolution which is harder to recover from then one node not responding. |
Beta Was this translation helpful? Give feedback.
-
|
Deploying Kafka with Istio enabled caused issues in service discovery and network resolution. Without the Istio sidecar, all broker IPs resolved correctly through Kubernetes DNS, and C.C. connected to all brokers normally. With the sidecar injected, DNS queries returned only controller node IPs, indicating that Istio filtered or limited DNS responses. C.C. also experienced misrouted traffic — requests to port 9091 were being redirected to controller pods, even though those pods don’t expose that port. This suggests that Istio’s service routing or DNS capture logic may be incorrectly mapping the service endpoints. Disabling the sidecar resolved both issues. These results show that Istio’s sidecar proxy interferes with Kafka’s DNS resolution and port routing, likely requiring adjustments such as disabling DNS capture or excluding Kafka workloads from Istio’s control. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the detailed writeup on this. What we've just done on our cluster to fix this is to apply the following: apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-all-to-controller-9091
namespace: kafka
spec:
selector:
matchLabels:
app: kafka-controller
action: DENY
rules:
- to:
- operation:
ports: ["9091"]I'm pretty new to using Strimzi and Istio, but this seems to work for us |
Beta Was this translation helpful? Give feedback.
-
Resolution: Disabling Istio sidecar injection on Kafka pods via KafkaNodePoolAfter testing multiple approaches, we decided to permanently disable Istio sidecar injection on Kafka pods using the apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
name: broker
spec:
template:
pod:
metadata:
annotations:
sidecar.istio.io/inject: "false"
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
name: controller
spec:
template:
pod:
metadata:
annotations:
sidecar.istio.io/inject: "false"Root cause summaryThe issue is not in Kafka, Cruise Control, or Strimzi configuration — it is caused by Istio intercepting Kafka's internal traffic. When sidecar injection is enabled on Kafka pods:
Why
|
| Istio feature lost | Kafka equivalent |
|---|---|
| Traffic metrics | JMX / Prometheus via Kafka's own exporters |
| mTLS | Kafka TLS + SASL (already configured) |
| Retries / circuit breaking | Handled natively by Kafka client protocol |
AuthorizationPolicy |
Kafka ACLs |
Kafka's protocol-level security and observability stack makes Istio's data-plane features redundant for broker pods specifically.
As a side note, we also tested the AuthorizationPolicy workaround posted by @kaeraali-flutterint (blocking port 9091 to controller pods), but it did not resolve the issue in our environment. The authentication failures persisted even with that policy applied. Disabling the sidecar entirely via KafkaNodePool podTemplate was the only approach that worked reliably for us.
Beta Was this translation helpful? Give feedback.
Resolution: Disabling Istio sidecar injection on Kafka pods via KafkaNodePool
After testing multiple approaches, we decided to permanently disable Istio sidecar injection on Kafka pods using the
podTemplateannotation directly in theKafkaNodePoolCR:Root cause summary
The issue is not in Kafka, Cruise Contro…