This guide explains how to configure Cluster Autoscaler to automatically scale nodes in an OKE Cluster Network. While Cluster Autoscaler does not directly support Cluster Networks, it can manage Instance Pools. Since each Cluster Network has an associated Instance Pool, you can configure autoscaling by targeting the Instance Pool.
Cluster Autoscaler automatically adjusts the number of nodes in your cluster based on workload demands. When configured with a Cluster Network's Instance Pool, it provides:
- Automatic scale-up when pods cannot be scheduled due to insufficient resources
- Automatic scale-down when nodes are underutilized
- Cost optimization by running only the required number of nodes
- Seamless integration with RDMA-enabled GPU workloads
- OKE cluster with a Cluster Network deployed
- kubectl configured with cluster-admin access
- OCI CLI installed (for retrieving Instance Pool ID)
- Understanding of Instance Principals or OCI API authentication
- Appropriate IAM policies for Cluster Autoscaler
Retrieve the Instance Pool ID associated with your Cluster Network.
You can retrieve the Instance Pool ID using either the OCI CLI or the OCI Console.
CLUSTER_NETWORK_ID=<your-cluster-network-id>
oci compute-management cluster-network get --cluster-network-id $CLUSTER_NETWORK_ID | jq -r '.data["instance-pools"][0].id'This command returns the OCID of the Instance Pool associated with your Cluster Network.
Example output:
ocid1.instancepool.oc1.phx.aaaaaaaxxxxxx
- Navigate to Menu > Compute > Cluster Networks
- Click on your Cluster Network name
- Under Instance Pools, click on the Instance Pool name
- Copy the OCID displayed on the Instance Pool details page
Configure Cluster Autoscaler to use the Instance Pool ID from Step 1. In the deployment manifest, update the --nodes parameter with your Instance Pool OCID.
The --nodes parameter format is: --nodes=<min>:<max>:<instance-pool-ocid>
- min: Minimum number of nodes (e.g., 0 or 1)
- max: Maximum number of nodes (e.g., 10)
- instance-pool-ocid: The OCID from Step 1
Example configuration snippet:
...
containers:
- image: iad.ocir.io/oracle/oci-cluster-autoscaler:{{ image tag }}
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --cloud-provider=oci
- --nodes=1:10:ocid1.instancepool.oc1.phx.aaaaaaaaqdxy35acq32zjfvk55qkwwctxhsprmz633k62qDeploy Cluster Autoscaler using the complete manifest below. This example uses Instance Principals for authentication.
Important
- Replace
{{ image tag }}with the correct image tag for your region and Kubernetes version - Replace the Instance Pool OCID in the
--nodesparameter - Find available images in Step 4b of the OCI documentation
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
rules:
- apiGroups: ["storage.k8s.io"]
resources: ["csidrivers", "csistoragecapacities"]
verbs: ["watch", "list"]
- apiGroups: [""]
resources: ["events", "endpoints"]
verbs: ["create", "patch"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
- apiGroups: [""]
resources: ["pods/status"]
verbs: ["update"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["endpoints"]
resourceNames: ["cluster-autoscaler"]
verbs: ["get", "update"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["watch", "list", "get"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["watch", "list", "get", "patch", "update"]
- apiGroups: [""]
resources:
- "pods"
- "services"
- "replicationcontrollers"
- "persistentvolumeclaims"
- "persistentvolumes"
verbs: ["watch", "list", "get"]
- apiGroups: ["extensions"]
resources: ["replicasets", "daemonsets"]
verbs: ["watch", "list", "get"]
- apiGroups: ["policy"]
resources: ["poddisruptionbudgets"]
verbs: ["watch", "list"]
- apiGroups: ["apps"]
resources: ["statefulsets", "replicasets", "daemonsets"]
verbs: ["watch", "list", "get"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses", "csinodes", "csistoragecapacities", "csidrivers"]
verbs: ["watch", "list", "get"]
- apiGroups: ["batch", "extensions"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["create"]
- apiGroups: ["coordination.k8s.io"]
resourceNames: ["cluster-autoscaler"]
resources: ["leases"]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create","list","watch"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames:
- "cluster-autoscaler-status"
- "cluster-autoscaler-priority-expander"
verbs: ["delete", "get", "update", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-autoscaler
subjects:
- kind: ServiceAccount
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cluster-autoscaler
subjects:
- kind: ServiceAccount
name: cluster-autoscaler
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: iad.ocir.io/oracle/oci-cluster-autoscaler:{{ image tag }}
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=5
- --logtostderr=true
- --cloud-provider=oci
- --nodes=0:10:ocid1.instancepool.oc1.phx.aaaaaaaaqdxy35acq32zjfvkybjmvlbdgj6q3m55qkwwctxhsprmz633k62q
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
- --namespace=kube-system
imagePullPolicy: "Always"
env:
- name: OCI_USE_INSTANCE_PRINCIPAL
value: "true"
- name: OCI_SDK_APPEND_USER_AGENT
value: "oci-oke-cluster-autoscaler"Save the manifest to a file (e.g., cluster-autoscaler.yaml) and apply it:
kubectl apply -f cluster-autoscaler.yamlCheck that Cluster Autoscaler is running:
kubectl get deployment cluster-autoscaler -n kube-systemExample output:
NAME READY UP-TO-DATE AVAILABLE AGE
cluster-autoscaler 1/1 1 1 2m
View the Cluster Autoscaler logs to confirm it's configured correctly:
kubectl logs -n kube-system deployment/cluster-autoscaler --tail=50To temporarily disable Cluster Autoscaler, scale the deployment to zero replicas:
kubectl scale deployment cluster-autoscaler -n kube-system --replicas=0To completely remove Cluster Autoscaler:
kubectl delete -f cluster-autoscaler.yamlWarning
Removing Cluster Autoscaler will stop automatic scaling. Manual intervention will be required to adjust node count.