Skip to content

Conversation

@paulfantom
Copy link

@paulfantom paulfantom commented Jun 22, 2023

This PR is a different (IMHO proper) fix to issue raised in #107. By using PodMonitor instead of ServiceMonitor we can simplify and fix a few things:

  1. Using SVC of type LB won't accidentally expoe /metrics endpoint outside of kubernetes.
  2. .Values.metrics.port is no longer needed as PodMonitor attaches to Pod instead of SVC.
  3. SVC object template is a bit less complicated.

The downside is that this is a breaking change. Alternative approach which is not breaking, but also not fixes all those issues is in

@paulfantom paulfantom changed the title podMonitor instead of serviceMonitor to prevent monitoring data leakage Use podMonitor instead of serviceMonitor to prevent monitoring data leakage Jun 22, 2023
@lucasfcnunes
Copy link

lucasfcnunes commented Jun 20, 2024

Why not split the service?

  1. *-docker-registry -> 5000 (http) (Now, can set type to LB without metrics leakage)
  2. *-docker-registry-metrics -> 5001 (http-metrics)

@paulfantom
Copy link
Author

Why not split the service?

At that point, why add another Service?

@lucasfcnunes
Copy link

lucasfcnunes commented Jan 8, 2025

At that point, why add another Service?

  1. No breaking changes
  2. "(...) to prevent monitoring data leakage"
  3. TargetDown Alert (https://runbooks.prometheus-operator.dev/runbooks/general/targetdown/)

@paulfantom
Copy link
Author

TargetDown doesn't require a Service. It requires a prometheus scrape job, which is created by using any of ServiceMonitor, PodMonitor, ScrapeConfig, Probe, etc. (ps. I wrote that runbook).

As for preventing data leakage, that's exactly a point of using PodMonitor. You have less moving parts and less points which can lead to data leakage.

@lucasfcnunes
Copy link

TargetDown doesn't require a Service. It requires a prometheus scrape job, which is created by using any of ServiceMonitor, PodMonitor, ScrapeConfig, Probe, etc. (ps. I wrote that runbook).

The language is all around Service and ServiceMonitor, so I assumed wrongly so.

TargetDown #
Meaning #
The alert means that one or more prometheus scrape targets are down. It fires when at least 10% of scrape targets in a Service are unreachable.

Full context
Prometheus works by sending an HTTP GET request to all of its “targets” every few seconds. So TargetDown really means that Prometheus just can’t access your service, which may or may not mean it’s actually down. If your service appears to be running fine, a common cause could be a misconfigured ServiceMonitor (maybe the port or path is incorrect), a misconfigured NetworkPolicy, or Service with incorrect labelSelectors that isn’t selecting any Pods.

@joshsizer
Copy link
Collaborator

Hi @paulfantom, please recommit with signed commits. Also, I would prefer to have backward compatible behavior - ie, if people are using service monitor, they should continue to use service monitors on upgrade.

I see value in being able to decide, at the end-user level, whether to use a podMonitor or a service monitor

@paulfantom
Copy link
Author

paulfantom commented Apr 12, 2025

I've added my signature.

As for backwards compatibility, IMHO requiring to have both objects (PodMonitor and ServiceMonitor) being optional puts unnecessary load on maintainers while not providing any tangible value. From prometheus perspective, any object (PodMonitor or ServiceMonitor) will result in the same data being gathered. While from maintenance perspective using ServiceMonitor requires additional logic to be present, ex. Service which exposes metrics port or defining metrics port value. Having both, doesn't give you any benefits while forcing to maintain more code and providing unclear API to end users. However since it is clearer to see in code, I've decided to create a branch where this should be clearly visible - https://github.com/twuni/docker-registry.helm/compare/main...paulfantom:docker-registry.helm:pod-sm-together?expand=1, especially check validate.yaml which now would need to be added to prevent misconfiguration.

From my perspective, your API that should be backwards compatible is not in installed objects (when they provide the same results), but in values.yaml. For example, we could remove .Values.serviceMonitor options however make templates still react to those. This way it provides a way to easily deprecate a setting.


That said, I am not a maintainer and it is up to you to choose a path forward :)

@joshsizer
Copy link
Collaborator

However since it is clearer to see in code, I've decided to create a branch where this should be clearly visible - https://github.com/twuni/docker-registry.helm/compare/main...paulfantom:docker-registry.helm:pod-sm-together?expand=1, especially check validate.yaml which now would need to be added to prevent misconfiguration.

@paulfantom I actually really like the approach you outlined above ^^

A major problem I see with removing ServiceMonitor is that if I am setting .Values.metrics.serviceMonitor.enabled: true, and then upgrade with the changes in this MR, I would no longer be scraping metrics. I think you are suggesting we can remedy that:

For example, we could remove .Values.serviceMonitor options however make templates still react to those. This way it provides a way to easily deprecate a setting.

But IMO that would be confusing. Why does setting .Values.metrics.serviceMonitor.enabled: true create a podMonitor?

@joshsizer joshsizer self-requested a review April 14, 2025 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants