Skip to content

feat: add action duration metric#196

Merged
Trojan295 merged 2 commits intomainfrom
kube-1330/add-action-duration-metric
Aug 25, 2025
Merged

feat: add action duration metric#196
Trojan295 merged 2 commits intomainfrom
kube-1330/add-action-duration-metric

Conversation

@Trojan295
Copy link
Copy Markdown
Contributor

@Trojan295 Trojan295 commented Aug 6, 2025

This PR adds the following metrics:

  • action_started_total - count of started actions by type
  • action_executed_duration_seconds - summary metric of the action duration for quantiles 0.5, 0.9 and 0.99.

It also adds support to export histogram and summary type metrics.

For action_executed_duration_seconds I decided to use a summary instead of a histogram to limit the number of series we send. Our action duration can range from milliseconds to dozens of seconds, so we might need up to 16 buckets to have some good data (and the _count, _sum series). If we want to have action type as dimension (we have 14 of those), that means:
3k clusters * 2 pods * (16 + 2) series * 14 action types = 1 512 000 series.

With a summary for 3 quantiles, we get:
3k clusters * 2 pods * (3 + 2) series * 14 action types = 420 000 series.

Still a lot, so we might consider keeping the metric export disabled by default and enable via env var (or maybe remotely from Cast AI).
Another drawback with summaries vs histograms is that we cannot aggregate them, because of the precalculation done on client side.

image

@Trojan295 Trojan295 changed the title Kube 1330/add action duration metric feat: add action duration metric Aug 6, 2025
@Trojan295 Trojan295 force-pushed the kube-1330/add-action-duration-metric branch from 6b9a202 to 0157399 Compare August 18, 2025 12:30
@Trojan295 Trojan295 requested a review from furkhat August 18, 2025 12:45
@Trojan295 Trojan295 marked this pull request as ready for review August 18, 2025 12:45
@Trojan295 Trojan295 requested a review from a team as a code owner August 18, 2025 12:45
@Trojan295 Trojan295 merged commit e3b15f1 into main Aug 25, 2025
6 checks passed
@Trojan295 Trojan295 deleted the kube-1330/add-action-duration-metric branch August 25, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants