fhpa-usage tutorial construction#28
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces a new tutorial for Karmada FederatedHPA, providing scripts and documentation to demonstrate multi-cluster autoscaling based on CPU metrics. The reviewer suggested several improvements to the setup script's robustness, including better error handling in generated scripts, safer JSON patching for resource limits, and ensuring idempotency for node operations. Additionally, feedback was provided to correct invalid kubeconfig merging logic and fix minor typos and redundancies in the documentation and verification scripts.
| @@ -0,0 +1,3 @@ | |||
| # Summary | |||
|
|
|||
| In this scenario, we installed metrics-server on member clusters and enabled the karmada-metrics-adapter on the control plane to provide CPU metrics for autoscaling,created a FederatedHPA resource to monitor CPU utilization across all member clusters and automatically adjust the number of replicas based on workload demand. To test the scaling behavior, we generated CPU load using the `williamyeh/hey` load-generator pod and observed the FederatedHPA trigger a scale-up event. After stopping the load generation, we also observed the FederatedHPA scale the workload back down automatically once CPU usage returned to normal. | |||
There was a problem hiding this comment.
There is a missing space after the comma in "autoscaling,created".
| In this scenario, we installed metrics-server on member clusters and enabled the karmada-metrics-adapter on the control plane to provide CPU metrics for autoscaling,created a FederatedHPA resource to monitor CPU utilization across all member clusters and automatically adjust the number of replicas based on workload demand. To test the scaling behavior, we generated CPU load using the `williamyeh/hey` load-generator pod and observed the FederatedHPA trigger a scale-up event. After stopping the load generation, we also observed the FederatedHPA scale the workload back down automatically once CPU usage returned to normal. | |
| In this scenario, we installed metrics-server on member clusters and enabled the karmada-metrics-adapter on the control plane to provide CPU metrics for autoscaling, created a FederatedHPA resource to monitor CPU utilization across all member clusters and automatically adjust the number of replicas based on workload demand. To test the scaling behavior, we generated CPU load using the `williamyeh/hey` load-generator pod and observed the FederatedHPA trigger a scale-up event. After stopping the load generation, we also observed the FederatedHPA scale the workload back down automatically once CPU usage returned to normal. |
| KUBECONFIG_PATH=${KUBECONFIG_PATH:-"${HOME}/.kube"} | ||
|
|
||
| function installKind() { | ||
| cat << EOF > installKind.sh |
There was a problem hiding this comment.
The generated script installKind.sh should include set -e to ensure it exits immediately if any command (like wget) fails. This applies to other generated scripts in this file as well (e.g., createCluster.sh, installMetricsServer.sh, etc.).
| cat << EOF > installKind.sh | |
| cat << EOF > installKind.sh | |
| set -e |
| } | ||
|
|
||
| function createCluster() { | ||
| cat << EOF > createCluster.sh |
| kind delete cluster --name=member1 || true | ||
| kind create cluster --name=member1 --config=cluster1.yaml | ||
| # Patch kindnet to use less CPU | ||
| kubectl --kubeconfig \$HOME/.kube/config patch daemonset kindnet -n kube-system --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "50m"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "200m"}]' |
There was a problem hiding this comment.
Using replace in a JSON patch will fail if the target path (e.g., resources) does not already exist in the object. Since kindnet typically does not have resource requests/limits defined by default in Kind, it is safer to use the add operation for the entire resources object.
| kubectl --kubeconfig \$HOME/.kube/config patch daemonset kindnet -n kube-system --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "50m"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "200m"}]' | |
| kubectl --kubeconfig \$HOME/.kube/config patch daemonset kindnet -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/resources", "value": {"requests": {"cpu": "50m"}, "limits": {"cpu": "200m"}}}]' |
|
|
||
| kind delete cluster --name=member2 || true | ||
| kind create cluster --name=member2 --config=cluster2.yaml | ||
| kubectl --kubeconfig \$HOME/.kube/config patch daemonset kindnet -n kube-system --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "50m"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "200m"}]' |
There was a problem hiding this comment.
As with the previous patch, using add is safer than replace if the resources field is missing from the kindnet manifest.
| kubectl --kubeconfig \$HOME/.kube/config patch daemonset kindnet -n kube-system --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "50m"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "200m"}]' | |
| kubectl --kubeconfig \$HOME/.kube/config patch daemonset kindnet -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/resources", "value": {"requests": {"cpu": "50m"}, "limits": {"cpu": "200m"}}}]' |
| kubectl --kubeconfig \$HOME/.kube/config patch deployment coredns -n kube-system --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "30m"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "100m"}]' | ||
| mv \$HOME/.kube/config config-member2 | ||
|
|
||
| KUBECONFIG=~/config-member1:~/config-member2 kubectl config view --merge --flatten >> ${KUBECONFIG_PATH}/config |
There was a problem hiding this comment.
Appending to a kubeconfig file using >> is incorrect as it creates an invalid YAML structure (multiple documents without separators). Furthermore, this merged config on the member node appears to be unused since the individual config files are scp-ed back to the host cluster and used explicitly via --kubeconfig in subsequent steps. This line should be removed to avoid confusion and potential corruption of the local config.
| root@${member_cluster_ip}:~ | ||
| } | ||
|
|
||
| kubectl delete node node01 |
There was a problem hiding this comment.
| #!/bin/bash | ||
| set -e | ||
|
|
||
| set -e |
zhzhuang-zju
left a comment
There was a problem hiding this comment.
Thanks @Krishiv-Mahajan
| 1. Join `kind-member1` and `kind-member2` to the host cluster. | ||
|
|
||
| RUN `MEMBER_CLUSTER_NAME=kind-member1`{{exec}} | ||
|
|
||
| This sets the variable `MEMBER_CLUSTER_NAME` to `kind-member1` for use in the join command. | ||
|
|
||
| RUN `karmadactl --kubeconfig /etc/karmada/karmada-apiserver.config join ${MEMBER_CLUSTER_NAME} --cluster-kubeconfig=$HOME/.kube/config-member1 --cluster-context=kind-member1`{{exec}} | ||
|
|
||
| This joins the `kind-member1` cluster to the Karmada control plane using its kubeconfig file and context. | ||
|
|
||
| RUN `MEMBER_CLUSTER_NAME=kind-member2`{{exec}} | ||
|
|
||
| This sets the variable to `kind-member2` for the second cluster join. | ||
|
|
||
| RUN `karmadactl --kubeconfig /etc/karmada/karmada-apiserver.config join ${MEMBER_CLUSTER_NAME} --cluster-kubeconfig=$HOME/.kube/config-member2 --cluster-context=kind-member2`{{exec}} |
There was a problem hiding this comment.
You keep switching between different scenarios here, which is not ideal.
Just use the non-variables approach.
There was a problem hiding this comment.
yeah I got a bit confused.
There was a problem hiding this comment.
I will use the non variable one in all scenarios
| The FederatedHPA (FHPA) relies on a two-layer metrics pipeline to gather the data needed for autoscaling. | ||
|
|
||
| First, the `metrics-server` component must be running on each member cluster. It is responsible for collecting per-pod resource utilization data (such as CPU and memory usage) at the local cluster level. |
There was a problem hiding this comment.
| The FederatedHPA (FHPA) relies on a two-layer metrics pipeline to gather the data needed for autoscaling. | |
| First, the `metrics-server` component must be running on each member cluster. It is responsible for collecting per-pod resource utilization data (such as CPU and memory usage) at the local cluster level. | |
| We need to install metrics-server for member clusters to provider the metrics API |
|
|
||
| RUN `bash ~/installMetricsServer.sh`{{exec}} | ||
|
|
||
| It automatically downloads the upstream metrics-server manifest, patches it with the `--kubelet-insecure-tls=true` flag for compatibility with our Kind environment, and applies it to both `kind-member1` and `kind-member2`. |
There was a problem hiding this comment.
| It automatically downloads the upstream metrics-server manifest, patches it with the `--kubelet-insecure-tls=true` flag for compatibility with our Kind environment, and applies it to both `kind-member1` and `kind-member2`. | |
| It automatically downloads the upstream metrics-server manifest and applies it to both `kind-member1` and `kind-member2`. |
|
|
||
| **2. Register the Custom Metrics API:** | ||
| Next, we must register the custom metrics `APIService` on both member clusters so the adapter can securely access their local metric endpoints. | ||
|
|
There was a problem hiding this comment.
The demo FHPA uses resource metrics, right? Why do we still need to register the Custom Metrics API?
There was a problem hiding this comment.
You are absolutely right, we dont need it , I will just remove it
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get deployment nginx`{{exec}} | ||
|
|
||
| This command confirms that the Nginx Deployment template has been successfully registered in the control plane. |
There was a problem hiding this comment.
| This command confirms that the Nginx Deployment template has been successfully registered in the control plane. |
|
|
||
| > **Note:** It takes a brief moment for the scheduler to distribute the workload and for the clusters to pull the container image. If the command returns "No resources found", wait ~30 seconds and run it again. Since our initial replica count is 1, you should see exactly 1 pod running on one of the member clusters. | ||
| > | ||
| > *Troubleshooting:* If you see several lines of `Unhandled Error` regarding `metrics.k8s.io`, don't worry! This is normal and just indicates that the Karmada metrics adapter is still starting up in the background. You can safely ignore these warnings. |
There was a problem hiding this comment.
Just a gentle correction. This is unrelated to Karmada metrics adapter, it actually depends on the metrics server within member clusters.
We can advance the verification step. After deploying metrics server, confirm its normal availability by running kubectl --kubeconfig $HOME/.kube/config-memberX top pods without errors.
There was a problem hiding this comment.
Thanks for the correction
| **Verify the Multi-Cluster Service:** | ||
|
|
||
| RUN `karmadactl --kubeconfig /etc/karmada/karmada-apiserver.config get svc --operation-scope members`{{exec}} | ||
|
|
||
| > *Note: If you see `Unhandled Error` warnings regarding metrics, you can safely ignore them.* | ||
|
|
||
| You should see the `nginx-service` running on the member clusters. This is the service we will use to generate load! |
There was a problem hiding this comment.
| **Verify the Multi-Cluster Service:** | |
| RUN `karmadactl --kubeconfig /etc/karmada/karmada-apiserver.config get svc --operation-scope members`{{exec}} | |
| > *Note: If you see `Unhandled Error` warnings regarding metrics, you can safely ignore them.* | |
| You should see the `nginx-service` running on the member clusters. This is the service we will use to generate load! |
It is improper to verify Multi-Cluster Service via the command karmadactl --kubeconfig /etc/karmada/karmada-apiserver.config get svc --operation-scope members. Services on member clusters are not created by Multi-Cluster Service, but distributed to member clusters in advance via PropagationPolicy.
We can remove this part for simplicity.
There was a problem hiding this comment.
I've applied this change , I completely removed that misleading verification step from step10/text.md.
zhzhuang-zju
left a comment
There was a problem hiding this comment.
Thanks, you made a great improvement compared with the initial version.
After all the new scenarios are merged, you can submit a PR to update them in README.md
|
|
||
| To confirm the metrics server is running normally, wait a few moments and then check if it can successfully serve pod metrics: | ||
|
|
||
| RUN `kubectl --kubeconfig=$HOME/.kube/config-member1 top pods --all-namespaces`{{exec}} |
There was a problem hiding this comment.
| RUN `kubectl --kubeconfig=$HOME/.kube/config-member1 top pods --all-namespaces`{{exec}} | |
| RUN `kubectl --kubeconfig=$HOME/.kube/config-member1 top pods --all-namespaces`{{exec}} | |
| RUN `kubectl --kubeconfig=$HOME/.kube/config-member2 top pods --all-namespaces`{{exec}} |
| set -e | ||
|
|
||
| kubectl --kubeconfig=$HOME/.kube/config-member1 -n kube-system get deployment metrics-server | ||
| kubectl --kubeconfig=$HOME/.kube/config-member2 -n kube-system get deployment metrics-server |
There was a problem hiding this comment.
| kubectl --kubeconfig=$HOME/.kube/config-member2 -n kube-system get deployment metrics-server | |
| kubectl --kubeconfig=$HOME/.kube/config-member2 -n kube-system get deployment metrics-server | |
| kubectl --kubeconfig=$HOME/.kube/config-member1 top pods --all-namespaces | |
| kubectl --kubeconfig=$HOME/.kube/config-member2 top pods --all-namespaces |
We can add a metrics availability check here to avoid unhandled metrics.k8s.io error warnings in subsequent steps.
| > *Note: As before, you can safely ignore any `metrics.k8s.io` Unhandled Error warnings if they appear.* | ||
|
|
There was a problem hiding this comment.
| > *Note: As before, you can safely ignore any `metrics.k8s.io` Unhandled Error warnings if they appear.* |
| With the local metrics servers running, we now need to bridge that data to the Karmada control plane so the FHPA controller can make global scaling decisions. | ||
|
|
||
| **Install the `karmada-metrics-adapter`:** | ||
| This add-on runs on the Karmada control plane and aggregates the metrics collected from the member clusters. It also automatically registers the `custom.metrics.k8s.io` APIService in the control plane, which the FederatedHPA controller uses to fetch metrics. |
There was a problem hiding this comment.
| This add-on runs on the Karmada control plane and aggregates the metrics collected from the member clusters. It also automatically registers the `custom.metrics.k8s.io` APIService in the control plane, which the FederatedHPA controller uses to fetch metrics. | |
| This add-on runs on the Karmada control plane and aggregates the metrics collected from the member clusters. It also automatically registers the `metrics.k8s.io` and `custom.metrics.k8s.io` APIServices in the control plane, which the FederatedHPA controller uses to fetch metrics. |
| set -e | ||
|
|
||
| kubectl --kubeconfig $HOME/.kube/config -n karmada-system get deployment karmada-metrics-adapter | ||
| kubectl --kubeconfig $HOME/.kube/config get apiservice v1beta1.custom.metrics.k8s.io |
There was a problem hiding this comment.
| kubectl --kubeconfig $HOME/.kube/config get apiservice v1beta1.custom.metrics.k8s.io | |
| kubectl --kubeconfig $HOME/.kube/config get apiservice v1beta1.metrics.k8s.io |
ecc8f88 to
013d297
Compare
ok, once all the prs are merged I will update the README.md |
Signed-off-by: Krishiv-Mahajan <mahajankrishiv10@gmail.com>
2f8d5a9 to
5ebe4f3
Compare
|
Thanks |
|
@zhzhuang-zju: GitHub didn't allow me to request PR reviews from the following users: for, APPROVAL. Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
part of karmada-io/karmada#7269
Description
This PR introduces a complete, interactive Killercoda tutorial demonstrating how to use the Karmada FederatedHPA (FHPA) to perform cross-cluster workload autoscaling.
The scenario guides the user through setting up a multi-cluster metrics pipeline, deploying a scalable workload, and triggering dynamic scale-up and scale-down events based on CPU utilization.
testing:
The scenario can be tested on :https://killercoda.com/karmada-demo/scenario/karmada-FHPA-example