Skip to content

Conversation

julianguinard
Copy link

@julianguinard julianguinard commented Feb 24, 2025

TL:DR

Add the ability for metrics-api scaler to get metrics from all endpoint targets of a kubernetes API service (aggregateFromKubeServiceEndpoints: true metadata) and aggregate them using average, sum, min or max aggregation functions (aggregationType metadata), which is a handy feature in an environment where one didn't set up a metric aggregator/scraping stack (i.e prometheus), or simply doesn't want to use their monitoring stack to fetch and serve metrics from customers workload in their own kubernetes clusters, and leave the metrics API's reponsability up to the customer

This PR comes from an issue met when metrics-api scaler targets an internal kubernetes service, such as the one from metrics_api E2E tests but, unlike it, has more than 1 replica serving it, leading to inconsistent HPA average calculation

The more a kubernetes service used by metrics API scaler has pods as targets, the less likely it is for the metrics API scaler to get all metrics from all these pods, leading to inconsistent HPA average metric computation and eventually to scaling issues. Especially in configurations where kubernetes services have "sticky" configurations

Below are screenshots of the modified E2E metrics_api_test.go, commenting out aggregateFromKubeServiceEndpoints: true addition to metadata, being executed & failing because scale in never occurs as every metric from all 10 replicas behind metrics server deployment are not all well taken into account

Screenshot from 2025-02-24 17-06-35

logs from keda-operator show same value being returned to metrics-api scaler by the kubernetes service (16)
Screenshot from 2025-02-24 16-18-07

As a consequence, test eventually fails because scaled deployment is unable to scale out within 3 minutes timeframe
Screenshot from 2025-02-24 16-19-00

Uncommenting aggregateFromKubeServiceEndpoints: true addition to metadata results in metric aggregation mode and same test passes this time

Screenshot from 2025-02-24 16-22-14
Screenshot from 2025-02-24 16-22-07

Checklist

Helm PR : kedacore/charts#739
Docs PR : kedacore/keda-docs#1541

@julianguinard julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch 2 times, most recently from 742e19d to be170fd Compare February 25, 2025 09:57
@julianguinard julianguinard marked this pull request as ready for review February 25, 2025 10:31
@julianguinard julianguinard requested a review from a team as a code owner February 25, 2025 10:31
@julianguinard julianguinard changed the title Add aggregation from kube service endpoints feature in metrics API scaler Feat : Add aggregation from kube service endpoints feature in metrics API scaler Feb 25, 2025
@zroubalik
Copy link
Member

zroubalik commented Mar 5, 2025

/run-e2e metrics_api
Update: You can check the progress here

@julianguinard
Copy link
Author

julianguinard commented Mar 6, 2025

/run-e2e metrics_api Update: You can check the progress here

@zroubalik thanks for triggering E2E, which ran fine here

@julianguinard julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch 2 times, most recently from 172e504 to 61ebd1f Compare March 11, 2025 15:02
@julianguinard
Copy link
Author

@zroubalik FYI I added this commit to make E2E more resilient on clusters where launching all metrics servers replicas take a longer time than expected to start : we now sleep 1 second & try 4 more times to validate expected pods before failing

It also fixes formatting of each endpoint URL to reach for aggregating metrics (double "/" in path, did not trigger errors in E2E because server ran by ghcr.io/kedacore/tests-metrics-api seems to cope with it anyway)

@JorTurFer
Copy link
Member

JorTurFer commented Mar 14, 2025

/run-e2e metrics_api
Update: You can check the progress here

@julianguinard
Copy link
Author

/run-e2e metrics_api Update: You can check the progress here

@JorTurFer thanks for E2E trigger which all ran fine

waiting for reviews & opinions on this now :)

@julianguinard julianguinard requested a review from JorTurFer March 19, 2025 10:31
julianguinard added a commit to julianguinard/keda that referenced this pull request Mar 24, 2025
…rics from all kubernetes service endpoints fail for whatever reason

see kedacore#6565 (comment)

Signed-off-by: julian GUINARD <[email protected]>
julianguinard added a commit to julianguinard/keda that referenced this pull request Mar 24, 2025
…rics from all kubernetes service endpoints fail for whatever reason

see kedacore#6565 (comment)

Signed-off-by: julian GUINARD <[email protected]>
@JorTurFer
Copy link
Member

JorTurFer commented Mar 24, 2025

/run-e2e metrics_api
Update: You can check the progress here

@julianguinard julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch from 1c33197 to 25f9b0d Compare March 24, 2025 10:37
julianguinard added a commit to julianguinard/keda that referenced this pull request Mar 24, 2025
…rics from all kubernetes service endpoints fail for whatever reason

see kedacore#6565 (comment)

Signed-off-by: julian GUINARD <[email protected]>
julianguinard added a commit to julianguinard/keda that referenced this pull request Mar 24, 2025
…rics from all kubernetes service endpoints fail for whatever reason

see kedacore#6565 (comment)

Signed-off-by: julian GUINARD <[email protected]>
@julianguinard julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch from 215053a to 908f8c3 Compare March 24, 2025 10:43
julianguinard added a commit to julianguinard/keda that referenced this pull request Mar 24, 2025
…rics from all kubernetes service endpoints fail for whatever reason

see kedacore#6565 (comment)

Signed-off-by: julian GUINARD <[email protected]>
@julianguinard julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch from 908f8c3 to c6ca0e4 Compare March 24, 2025 10:44
julianguinard added a commit to julianguinard/keda that referenced this pull request Mar 24, 2025
…rics from all kubernetes service endpoints fail for whatever reason

see kedacore#6565 (comment)

Signed-off-by: julian GUINARD <[email protected]>
@JorTurFer
Copy link
Member

JorTurFer commented Mar 24, 2025

/run-e2e metrics_api
Update: You can check the progress here

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I've kept some implementation ideas inline

@julianguinard julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch from 4ab95f5 to 3bee98d Compare September 26, 2025 10:03
@keda-automation keda-automation requested a review from a team September 26, 2025 10:03
@julianguinard julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch 3 times, most recently from e357f4b to e3925e0 Compare September 26, 2025 10:46
@zroubalik
Copy link
Member

zroubalik commented Sep 26, 2025

/run-e2e metrics
Update: You can check the progress here

@JorTurFer
Copy link
Member

@kedacore/keda-core-contributors @kedacore/keda-core-maintainers PTAL , this is a good addition for the release

@keda-automation keda-automation requested review from a team October 1, 2025 07:40
@julianguinard
Copy link
Author

Repushed for changelog conflict solving

@rickbrouwer
Copy link
Member

rickbrouwer commented Oct 1, 2025

/run-e2e metrics
Update: You can check the progress here

Copy link
Member

@wozniakjan wozniakjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks super usefull, few items regarding EndpointSlice usage below

julianguinard added a commit to julianguinard/keda that referenced this pull request Oct 8, 2025
@julianguinard julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch from d0497e5 to 5bc7805 Compare October 8, 2025 07:51
@keda-automation keda-automation requested a review from a team October 8, 2025 07:51
@julianguinard
Copy link
Author

this looks super usefull, few items regarding EndpointSlice usage below

Hello @wozniakjan and thanks for your feedback ❤️
I adapted EndpointSlice usage according to tour suggestions. In the iteration, we now discard any unready endpoint or endpoint adresses which have already been registered
feel free to review it again!

@keda-automation keda-automation requested a review from a team October 8, 2025 12:05
julianguinard and others added 3 commits October 8, 2025 14:26
…scaler

- make generate-scalers-schema

Signed-off-by: julian GUINARD <[email protected]>
Co-authored-by: Jan Wozniak <[email protected]>
Signed-off-by: julianguinard <[email protected]>
@julianguinard julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch from 881798c to 0a5d42c Compare October 8, 2025 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants