Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/vpa_release.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ completed step on this issue.
Please provide any information that is related to the release:

- When we plan to do the release?
- Are there any issues / PRs blocking the release?
- Are there any issues / PRs blocking the release?
4 changes: 2 additions & 2 deletions .github/workflows/ca-benchmark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,12 @@ jobs:
echo "### Cluster Autoscaler Benchmark Results" >> $GITHUB_STEP_SUMMARY
echo "Comparing PR branch against \`${{ github.event.pull_request.base.ref }}\`" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY

if [ -s base.txt ] && [ -s pr.txt ]; then
echo '```' >> $GITHUB_STEP_SUMMARY
$(go env GOPATH)/bin/benchstat base.txt pr.txt | tee benchstat.txt >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY

# Fail if any regression is > 10%
grep -qE '\+[1-9][0-9].*%' benchstat.txt && { echo "Regression detected > 10%"; exit 1; } || true
else
Expand Down
16 changes: 16 additions & 0 deletions .github/workflows/precommit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: pre-commit

on:
- push
- pull_request

permissions:
contents: read

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
- uses: pre-commit/action@v3.0.1
36 changes: 26 additions & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,38 @@
# Generated folders and files are supposed to be excluded from the
# list if pre-commit inconsistencies in generated content by modifying
# them.
exclude: |
(?x)^(
addon-resizer/vendor/ |
cluster-autoscaler/cloudprovider/oci/vendor-internal/ |
vertical-pod-autoscaler/
)

repos:
- hooks:
- repo: https://github.com/pre-commit/pre-commit-hooks
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
repo: https://github.com/pre-commit/pre-commit-hooks
- id: mixed-line-ending
- id: check-added-large-files
rev: v3.1.0
- hooks:
- repo: https://github.com/gruntwork-io/pre-commit
hooks:
- id: helmlint
repo: https://github.com/gruntwork-io/pre-commit
rev: v0.1.9
- hooks:
- id: helm-docs
- repo: https://github.com/norwoodj/helm-docs
hooks:
- id: helm-docs-built
files: (README\.md\.gotmpl|(Chart|requirements|values)\.yaml)$
repo: https://github.com/norwoodj/helm-docs
rev: v1.3.0
- hooks:
rev: v1.14.2
- repo: local
hooks:
- id : update-flags
name: Update Cluster-Autoscaler Flags Table
entry: bash cluster-autoscaler/hack/update-faq-flags.sh
language: system
files: cluster-autoscaler/config/flags/flags\.go
repo: local
- repo: https://github.com/TekWizely/pre-commit-golang
rev: v1.0.0-rc.4
hooks:
- id: go-fmt
12 changes: 6 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
### Signing Contributor License Agreements(CLA)

We'd love to accept your patches! Before we can take them, we have to jump a couple of legal hurdles.

Please fill out either the individual or corporate Contributor License Agreement
(CLA).

* If you are an individual writing original source code and you're sure you
own the intellectual property, then you'll need to sign an
[individual CLA](https://identity.linuxfoundation.org/node/285/node/285/individual-signup).
Expand All @@ -21,15 +21,15 @@ We'd love to accept your patches! Before we can take them, we have to jump a cou
* Fork the desired repo, develop and test your code changes.
* Submit a pull request.

All changes must be code reviewed. Coding conventions and standards are explained in the official
[developer docs](https://github.com/kubernetes/community/tree/master/contributors/devel). Expect
All changes must be code reviewed. Coding conventions and standards are explained in the official
[developer docs](https://github.com/kubernetes/community/tree/master/contributors/devel). Expect
reviewers to request that you avoid common [go style mistakes](https://go.dev/wiki/CodeReviewComments)
in your PRs.

### Merge Approval

Autoscaler collaborators may add "LGTM" (Looks Good To Me) or an equivalent comment to indicate
that a PR is acceptable. Any change requires at least one LGTM. No pull requests can be merged
Autoscaler collaborators may add "LGTM" (Looks Good To Me) or an equivalent comment to indicate
that a PR is acceptable. Any change requires at least one LGTM. No pull requests can be merged
until at least one Autoscaler collaborator signs off with an LGTM.

### Support Channels
Expand Down
2 changes: 1 addition & 1 deletion addon-resizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ parameters:
*Note: Addon Resizer uses buckets of cluster sizes, so it will use n larger
than the cluster size by up to 50% for clusters larger than 16 nodes. For
smaller clusters, n = 16 will be used.*

2. Memory parameters:
```
--memory
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

Currently Addon Resizer supports scaling based on the number of nodes. Some workloads use resources proportionally to
the number of containers in the cluster. Since number of containers per node is very different in different clusters
it's more resource-efficient to scale such workloads based directly on the container count.
it's more resource-efficient to scale such workloads based directly on the container count.

### Goals

Expand Down Expand Up @@ -46,7 +46,7 @@ Addon Resizer 1.8 assumes in multiple places that it's scaling based on the numb
to either node count or container count, depending on the value of the `--scaling-mode` flag.
- Many variable names in code which now refer to node count will refer to cluster size and should be renamed accordingly.

In addition to implementing the feature we should also clean up the code and documentation.
In addition to implementing the feature we should also clean up the code and documentation.

### Risks and Mitigations

Expand All @@ -59,7 +59,7 @@ all containers could result in higher load on the Cluster API server. Since Addo
I don't expect this effect to be noticeable.

Also I expect metrics-server to test for this before using the feature and any other users of Addon Resizer are likely
better off using metrics (which don't have this problem).
better off using metrics (which don't have this problem).

## Design Details

Expand Down Expand Up @@ -120,4 +120,4 @@ Both tests should be performed with metrics- and API- based scaling.
[`Status.Phase`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L4011
[selector excluding pods in terminal states in VPA]: https://github.com/kubernetes/autoscaler/blob/04e5bfc88363b4af9fdeb9dfd06c362ec5831f51/vertical-pod-autoscaler/e2e/v1beta2/common.go#L195
[`updateResources()`]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/nanny_lib.go#L126
[`example.yaml`]: https://github.com/kubernetes/autoscaler/blob/c8d612725c4f186d5de205ed0114f21540a8ed39/addon-resizer/deploy/example.yaml
[`example.yaml`]: https://github.com/kubernetes/autoscaler/blob/c8d612725c4f186d5de205ed0114f21540a8ed39/addon-resizer/deploy/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
Sure, here's the enhancement proposal in the requested format:

## Summary
- **Goals:** The goal of this enhancement is to improve the user experience for applying nanny configuration changes in the addon-resizer 1.8 when used with the metrics server. The proposed solution involves automatically reloading the nanny configuration whenever changes occur, eliminating the need for manual intervention and sidecar containers.
- **Goals:** The goal of this enhancement is to improve the user experience for applying nanny configuration changes in the addon-resizer 1.8 when used with the metrics server. The proposed solution involves automatically reloading the nanny configuration whenever changes occur, eliminating the need for manual intervention and sidecar containers.
- **Non-Goals:** This proposal does not aim to update the functional behavior of the addon-resizer.

## Proposal
Expand Down
1 change: 0 additions & 1 deletion balancer/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,3 @@ format:
test -z "$$(find . -path ./vendor -prune -type f -o -name '*.go' -exec gofmt -s -w {} + | tee /dev/stderr)"

.PHONY: all build test-unit clean format release

2 changes: 1 addition & 1 deletion balancer/examples/nginx-priority.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#
#
# Balancer scaling 2 deployments using priority policy.
#
apiVersion: apps/v1
Expand Down
66 changes: 33 additions & 33 deletions balancer/proposals/balancer.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,50 @@

# KEP - Balancer
# KEP - Balancer

## Introduction

One of the problems that the users are facing when running Kubernetes deployments is how to
deploy pods across several domains and keep them balanced and autoscaled at the same time.
One of the problems that the users are facing when running Kubernetes deployments is how to
deploy pods across several domains and keep them balanced and autoscaled at the same time.
These domains may include:

* Cloud provider zones inside a single region, to ensure that the application is still up and running, even if one of the zones has issues.
* Different types of Kubernetes nodes. These may involve nodes that are spot/preemptible, or of different machine families.
* Different types of Kubernetes nodes. These may involve nodes that are spot/preemptible, or of different machine families.

A single Kubernetes deployment may either leave the placement entirely up to the scheduler
(most likely leading to something not entirely desired, like all pods going to a single domain) or
focus on a single domain (thus not achieving the goal of being in two or more domains).
A single Kubernetes deployment may either leave the placement entirely up to the scheduler
(most likely leading to something not entirely desired, like all pods going to a single domain) or
focus on a single domain (thus not achieving the goal of being in two or more domains).

PodTopologySpreading solves the problem a bit, but not completely. It allows only even spreading
and once the deployment gets skewed it doesn’t do anything to rebalance. Pod topology spreading
(with skew and/or ScheduleAnyway flag) is also just a hint, if skewed placement is available and
allowed then Cluster Autoscaler is not triggered and the user ends up with a skewed deployment.
PodTopologySpreading solves the problem a bit, but not completely. It allows only even spreading
and once the deployment gets skewed it doesn’t do anything to rebalance. Pod topology spreading
(with skew and/or ScheduleAnyway flag) is also just a hint, if skewed placement is available and
allowed then Cluster Autoscaler is not triggered and the user ends up with a skewed deployment.
A user could specify a strict pod topolog spreading but then, in case of problems the deployment
would not move its pods to the domains that are available. The growth of the deployment would also
would not move its pods to the domains that are available. The growth of the deployment would also
be totally blocked as the available domains would be too much skewed.

Thus, if full flexibility is needed, the only option is to have multiple deployments, targeting
different domains. This setup however creates one big problem. How to consistently autoscale multiple
deployments? The simplest idea - having multiple HPAs is not stable, due to different loads, race
conditions or so, some domains may grow while the others are shrunk. As HPAs and deployments are
not connected anyhow, the skewed setup will not fix itself automatically. It may eventually come to
a semi-balanced state but it is not guaranteed.
Thus, if full flexibility is needed, the only option is to have multiple deployments, targeting
different domains. This setup however creates one big problem. How to consistently autoscale multiple
deployments? The simplest idea - having multiple HPAs is not stable, due to different loads, race
conditions or so, some domains may grow while the others are shrunk. As HPAs and deployments are
not connected anyhow, the skewed setup will not fix itself automatically. It may eventually come to
a semi-balanced state but it is not guaranteed.


Thus there is a need for some component that will:

* Keep multiple deployments aligned. For example it may keep an equal ratio between the number of
pods in one deployment and the other. Or put everything to the first and overflow to the second and so on.
* React to individual deployment problems should it be zone outage or lack of spot/preemptible vms.
* React to individual deployment problems should it be zone outage or lack of spot/preemptible vms.
* Actively try to rebalance and get to the desired layout.
* Allow to autoscale all deployments with a single target, while maintaining the placement policy.

## Balancer
## Balancer

Balancer is a stand-alone controller, living in userspace (or in control plane, if needed) exposing
a CRD API object, also called Balancer. Each balancer object has pointers to multiple deployments
or other pod-controlling objects that expose the Scale subresource. Balancer periodically checks
the number of running and problematic pods inside each of the targets, compares it with the desired
number of replicas, constraints and policies and adjusts the number of replicas on the targets,
Balancer is a stand-alone controller, living in userspace (or in control plane, if needed) exposing
a CRD API object, also called Balancer. Each balancer object has pointers to multiple deployments
or other pod-controlling objects that expose the Scale subresource. Balancer periodically checks
the number of running and problematic pods inside each of the targets, compares it with the desired
number of replicas, constraints and policies and adjusts the number of replicas on the targets,
should some of them run too many or too few of them. To allow being an HPA target Balancer itself
exposes the Scale subresource.

Expand All @@ -66,7 +66,7 @@ type Balancer struct {
// +optional
Status BalancerStatus
}

// BalancerSpec is the specification of the Balancer behavior.
type BalancerSpec struct {
// Targets is a list of targets between which Balancer tries to distribute
Expand All @@ -84,7 +84,7 @@ type BalancerSpec struct {
// Policy defines how the balancer should distribute replicas among targets.
Policy BalancerPolicy
}

// BalancerTarget is the declaration of one of the targets between which the balancer
// tries to distribute replicas.
type BalancerTarget struct {
Expand All @@ -105,14 +105,14 @@ type BalancerTarget struct {
// +optional
MaxReplicas *int32
}

// BalancerPolicyName is the name of the balancer Policy.
type BalancerPolicyName string
const (
PriorityPolicyName BalancerPolicyName = "priority"
ProportionalPolicyName BalancerPolicyName = "proportional"
)

// BalancerPolicy defines Balancer policy for replica distribution.
type BalancerPolicy struct {
// PolicyName decides how to balance replicas across the targets.
Expand All @@ -131,7 +131,7 @@ type BalancerPolicy struct {
// +optional
Fallback *Fallback
}

// PriorityPolicy contains details for Priority-based policy for Balancer.
type PriorityPolicy struct {
// TargetOrder is the priority-based list of Balancer targets names. The first target
Expand All @@ -141,7 +141,7 @@ type PriorityPolicy struct {
// list, and/or total Balancer's replica count.
TargetOrder []string
}

// ProportionalPolicy contains details for Proportion-based policy for Balancer.
type ProportionalPolicy struct {
// TargetProportions is a map from Balancer targets names to rates. Replicas are
Expand All @@ -152,7 +152,7 @@ type ProportionalPolicy struct {
// of the total Balancer's replica count, proportions or the presence in the map.
TargetProportions map[string]int32
}

// Fallback contains information how to recognize and handle replicas
// that failed to start within the specified time period.
type Fallback struct {
Expand All @@ -162,7 +162,7 @@ type Fallback struct {
// may be stopped.
StartupTimeout metav1.Duration
}

// BalancerStatus describes the Balancer runtime state.
type BalancerStatus struct {
// Replicas is an actual number of observed pods matching Balancer selector.
Expand Down
2 changes: 1 addition & 1 deletion builder/README.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
A Docker image that is used to build autoscaling-related binaries.
A Docker image that is used to build autoscaling-related binaries.
Loading
Loading