Conversation
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
|
Sorry for the late response.
That part made sense to me. The way the ServiceMonitor resource works you would have the metrics for the non-leader controllers appear to be down (I guess they are down, technically).
I don't quite get how this fixes the issue. We only configure a second |
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Apologies for the confusion. Let me provide more details on how this fix addresses the issue. By replacing the PodMonitor with ServiceMonitor, we change the way Prometheus scrapes the metrics. With PodMonitor, Prometheus attempts to scrape metrics from every pod, which can lead to issues when multiple replicas of the LINSTOR controller are present. When we switch to ServiceMonitor, Prometheus only scrapes the endpoints of the service itself, rather than each individual pod. This resolves the problem of having only one target up in Prometheus when multiple replicas are deployed. Additionally, we mentioned "secured" in the context of using RBAC proxy to secure communications for collecting metrics. When RBAC proxy is in use, the metrics are provided through another port, which needs to be added to the endpoints of the LINSTOR controller service. By configuring the I hope this clarifies how the changes address the issue and the relevance of the "secured" aspect. Please let me know if you have any further questions or concerns. |
If a Linstor controller has more than one replica, only one target will be up in Prometheus. This is because of the leader election mechanism for Linstor controllers. Once a leader is elected, other pods of Linstor controllers won't provide any metrics.
This PR addresses this issue when using a service monitor. When the securedMetricsPort parameter is set, the operator will add a metrics port to the Linstor controller service and Linstor controller endpoints.