-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Emit metric tracking empty responses from prometheus #7060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Emit metric tracking empty responses from prometheus #7060
Conversation
Thank you for your contribution! 🙏 Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected. While you are waiting, make sure to:
Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient. Learn more about our contribution guide. |
3f967e9
to
4a13d88
Compare
7cd8675
to
6f1d422
Compare
Signed-off-by: Daniele Rolando <[email protected]>
6f1d422
to
d642098
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, you should open a PR in the keda-docs repository to update documentation to add this metric. My only question will be is if the maintainers would like this metric to be more generic but like you said, the prometheus scaler is the only scaler with isNullOrEmpty
as a configurable field.
@wozniakjan I know you're in charge of cutting this release. Wanted to put this on your radar.
Hello tbh, I think that this PR isa great starting point and just updating it a bit we can make it generic |
Agree |
pkg/metricscollector/prommetrics.go
Outdated
prometheus.CounterOpts{ | ||
Namespace: DefaultPromMetricsNamespace, | ||
Subsystem: "prometheus", | ||
Name: "metrics_empty_error_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rickbrouwer @JorTurFer Thoughts on naming this empty_upstream_responses_total
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upstream
is quite generic, what about scaler/trigger
instead?
Signed-off-by: Daniele Rolando <[email protected]>
a30f2bc
to
3ad2203
Compare
Signed-off-by: Daniele Rolando <[email protected]>
Co-authored-by: Jorge Turrado Ferrero <[email protected]> Signed-off-by: drolando-stripe <[email protected]>
Co-authored-by: Jorge Turrado Ferrero <[email protected]> Signed-off-by: drolando-stripe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking nice! Only one small thing inline about implementation.
Could you include this new metric during prometheus/otel e2e test?
func RecordEmptyUpstreamResponse() { | ||
for _, element := range collectors { | ||
element.RecordEmptyUpstreamResponse() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's enrich this metric with information about ScaledJob|ScaledObject and trigger that has recorded the empty response. You can see how it's done with other metric like RecordCloudEventEmittedError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it. The problem is that the ExecutePromQuery
function doesn't have access to the trigger or scaledobject name afaict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ScaledObject name and the trigger name are passed to the scaler during the build process as part of scalersconfig.ScalerConfig
struct, so you will need to take them and store them in the scaler to have the avalable for the metric but I think that the info is quite valiable to know which scaler is failing
@JorTurFer I left a note about this in the PR description. The end-to-end tests don't actually run prometheus, so the scaler query always fails with |
@drolando-stripe What do you mean by that, please? The existing prom e2e tests use this helper to setup prometheus: https://github.com/kedacore/keda/blob/main/tests/scalers/prometheus/prometheus_helper.go |
I see there are other upstream error responses from this scaler. Is an empty response enough for tracking purposes? |
We'd like to have a way to monitor the number of Keda errors due to empty responses from prometheus after enabling the
ignoreNullValues
flag for most of our prometheus triggers.Right now this error gets logged but the error metric that Keda emits is generic and doesn't differentiate by error type.
Tests
I tried adding an e2e test but they don't actually run prometheus, so all queries fail with
dial tcp: lookup keda-prometheus.keda.svc.cluster.local on 10.96.0.10:53: no such host
. The only 2 tests intests/sequential/prometheus_metrics/prometheus_metrics_test.go
to use a prometheus trigger only look for errors and don't actually run the query.I could add a test that shows that the metric exists and is zero, but that doesn't seem very useful.
Checklist
When introducing a new scaler, I agree with the scaling governance policyN/ATests have been addedA PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)N/AA PR is opened to update the documentation on (repo) (if applicable)N/AFixes #7062