Skip to content

feat: add fallback support for value metric type #6655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

y-rabie
Copy link
Contributor

@y-rabie y-rabie commented Mar 27, 2025

  • Since the constraint on having fallback only for AverageValue seems to me kinda unwarranted, it's here relaxed a bit, and can be removed altogether if we opt for another implementation.

    The constraint now becomes that the scaleTargetRef object has a field .status.readyReplicas that its controller updates with the number of ready replicas, so that we can directly use that. This is de facto the case with Deployments/StatefulSets/Replicasets/Argo Rollouts.

    We can then generically fetch the object as unstructured and access the value of the field to divide by it. A brief math illustration starting with the HPA's equation

    desiredReplicas = ceil [currentReplicas * (currentMetricValue/desiredMetricValue) ]

    By passing currentMetricValue = desiredMetricValue * fallbackReplicas / currentReplicas

    We end up with

    desiredReplicas = ceil [currentReplicas * (( desiredMetricValue * fallbackReplicas / currentReplicas )/desiredMetricValue) ]

    desiredReplicas = ceil [currentReplicas * (fallbackReplicas / currentReplicas ) ] = ceil [fallbackReplicas] = fallbackReplicas

    Emphasis: currentReplicas in HPA's equation is the number of ready replicas.

    I preferred this approach to the other one (which would remove the .status.readyReplicas field constraint) which is manually counting the number of ready pods (similar to what HPA does here), since it'd be quite involved. If it seems a better approach to you, I can implement it.

  • For full clarity, a problematic nit with this: is that we're dependent on the object's controller updating the .status.readyReplicas in a timely manner.

    If there had been a lag, then the currentReplicas the HPA multiplies by (which it gets by manually counting pods) would deviate from the currentReplicas value we divide by.

    If ours is less, then we'd scale higher than fallbackReplicas for a brief while; if ours is more, then we'd scale less than fallbackReplicas. Either way we should eventually stabilize at fallbackReplicas exactly.

  • Final unrelated small change for correctness sake (that I think should be fixed regardless of this PR), the line

    Value: *resource.NewMilliQuantity(normalisationValue*1000*replicas, resource.DecimalSI),

    has been changed to

    Value: *resource.NewMilliQuantity(int64(normalisationValue*1000*replicas), resource.DecimalSI),

    with normalizationValue being float64 instead of int.

    This prevents early casting to int, which would discard fractions in normalizationValue early on, that would have been already rounded when multiplying by 1000 below.

    Imagine normalisationValue=2.5 and fallbackReplicas=10, previously, Value = 2*1000*10 = 20000.

    Now, Value = 2.5*1000*10 = int64(25000.0) = 2500.

    Obviously the former will cause HPA to not scale to fallbackReplicas exactly.

For the unchecked list item, if this looks good, I'll open a PR to the docs repo.

Checklist

@y-rabie y-rabie requested a review from a team as a code owner March 27, 2025 03:18
@y-rabie y-rabie force-pushed the support-fallback-for-value-type branch 4 times, most recently from 5592961 to aeb2c2d Compare March 27, 2025 03:45
@y-rabie y-rabie force-pushed the support-fallback-for-value-type branch from aeb2c2d to 3349924 Compare March 27, 2025 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant