multi-component-workload tutorial creation #29
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive tutorial for scheduling multi-component workloads, such as Apache Flink and Volcano, across multiple clusters using Karmada. The changes include environment setup scripts, tutorial documentation, and verification logic. Feedback from the review identifies several critical issues, including a logic bug in the Flink Lua interpreter where JobManager memory was incorrectly referenced for TaskManagers, and a potential kubeconfig corruption issue during cluster creation. Additionally, the reviewer pointed out incorrect memory units in the Flink manifest, inconsistencies between the code and tutorial text, and missing verification checks for the scheduler component.
| local tm_cpu = get(observedObj, {"spec","taskManager","resource","cpu"}) | ||
| local tm_memory = get(observedObj, {"spec","taskManager","resource","memory"}) | ||
| if tm_cpu ~= nil then tm_requires.resourceRequest.cpu = tm_cpu end | ||
| if tm_memory ~= nil then tm_requires.resourceRequest.memory = kube.getResourceQuantity(tm_memory) end |
There was a problem hiding this comment.
There is a bug here: jm_memory is used instead of tm_memory. Additionally, removing kube.getResourceQuantity() will keep the output consistent with the tutorial text and the Volcano example.
| if tm_memory ~= nil then tm_requires.resourceRequest.memory = kube.getResourceQuantity(tm_memory) end | |
| if tm_memory ~= nil then tm_requires.resourceRequest.memory = tm_memory end |
| mv $HOME/.kube/config ~/config-member1 | ||
| kind create cluster --name=member2 --config=cluster2.yaml | ||
| mv $HOME/.kube/config config-member2 | ||
| KUBECONFIG=~/config-member1:~/config-member2 kubectl config view --merge --flatten >> ${KUBECONFIG_PATH}/config |
There was a problem hiding this comment.
Using >> to append the output of kubectl config view --flatten to an existing kubeconfig file will result in an invalid YAML file if the destination already contains data (e.g., the default config in a Killercoda environment). It is better to overwrite the file or use a proper merge strategy.
| KUBECONFIG=~/config-member1:~/config-member2 kubectl config view --merge --flatten >> ${KUBECONFIG_PATH}/config | |
| KUBECONFIG=~/config-member1:~/config-member2 kubectl config view --merge --flatten > ${KUBECONFIG_PATH}/config |
| local jm_cpu = get(observedObj, {"spec","jobManager","resource","cpu"}) | ||
| local jm_memory = get(observedObj, {"spec","jobManager","resource","memory"}) | ||
| if jm_cpu ~= nil then jm_requires.resourceRequest.cpu = jm_cpu end | ||
| if jm_memory ~= nil then jm_requires.resourceRequest.memory = kube.getResourceQuantity(jm_memory) end |
There was a problem hiding this comment.
For consistency with the Volcano example and to match the expected output in the tutorial steps, it is better to avoid using kube.getResourceQuantity() for memory. This ensures the original string (e.g., "100Mi") is preserved in the ResourceBinding.
| if jm_memory ~= nil then jm_requires.resourceRequest.memory = kube.getResourceQuantity(jm_memory) end | |
| if jm_memory ~= nil then jm_requires.resourceRequest.memory = jm_memory end |
| replicas: 1 | ||
| resource: | ||
| cpu: 1 | ||
| memory: 100m |
| taskManager: | ||
| resource: | ||
| cpu: 1 | ||
| memory: 100m |
| @@ -0,0 +1,3 @@ | |||
| #!/bin/bash | |||
|
|
|||
| kubectl -n karmada-system get deployment karmada-controller-manager -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-webhook -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" | |||
There was a problem hiding this comment.
The verification script should also check the karmada-scheduler deployment, as it was also patched in the instructions to enable the MultiplePodTemplatesScheduling feature gate.
| kubectl -n karmada-system get deployment karmada-controller-manager -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-webhook -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" | |
| kubectl -n karmada-system get deployment karmada-controller-manager -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-scheduler -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-webhook -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" |
| The Flink manifest specified `parallelism: 2` and `taskmanager.numberOfTaskSlots: "2"`. Using the Lua interpreter we applied earlier, Karmada correctly calculates that `ceil(2/2) = 1` taskManager replica is needed. Let's verify that Karmada captured this, along with the CPU (1) and memory (100m) requests: | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[] | select(.name=="taskmanager") | .replicaRequirements.resourceRequest'`{{exec}} | ||
|
|
||
| This outputs a JSON object with `"cpu": "1"` and `"memory": "100m"`. |
There was a problem hiding this comment.
The expected memory value in the tutorial text should be updated to 100Mi to be consistent with the fix in the manifest and the actual resource requirements.
| The Flink manifest specified `parallelism: 2` and `taskmanager.numberOfTaskSlots: "2"`. Using the Lua interpreter we applied earlier, Karmada correctly calculates that `ceil(2/2) = 1` taskManager replica is needed. Let's verify that Karmada captured this, along with the CPU (1) and memory (100m) requests: | |
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[] | select(.name=="taskmanager") | .replicaRequirements.resourceRequest'`{{exec}} | |
| This outputs a JSON object with `"cpu": "1"` and `"memory": "100m"`. | |
| The Flink manifest specified `parallelism: 2` and `taskmanager.numberOfTaskSlots: "2"`. Using the Lua interpreter we applied earlier, Karmada correctly calculates that `ceil(2/2) = 1` taskManager replica is needed. Let's verify that Karmada captured this, along with the CPU (1) and memory (100Mi) requests:\n\nRUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[] | select(.name=="taskmanager") | .replicaRequirements.resourceRequest'`{{exec}}\n\nThis outputs a JSON object with "cpu": "1" and "memory": "100Mi". |
|
|
||
| Multi-component scheduling (`MultiplePodTemplatesScheduling`) is currently an **Alpha** feature in Karmada and is **disabled by default**. We need to explicitly enable it on the `karmada-controller-manager`, `karmada-scheduler`, and `karmada-webhook` components. | ||
|
|
||
| > **Note:** Because these components are running as native Pods on the underlying host cluster, we patch them using the default `kubectl` context, **not** the Karmada API server kubeconfig. We also temporarily change their deployment strategy to `Recreate` to prevent resource deadlocks during the rollout. |
There was a problem hiding this comment.
We also temporarily change their deployment strategy to
Recreateto prevent resource deadlocks during the rollout.
Can you elaborate more on this?
There was a problem hiding this comment.
I have added a bit more detail in the latest commit
There was a problem hiding this comment.
Thanks for the explanation. However, these three components don't seem to have resource configurations declared. So there shouldn't be deadlocks during their restart, right?
| @@ -0,0 +1,3 @@ | |||
| #!/bin/bash | |||
|
|
|||
| kubectl -n karmada-system get deployment karmada-controller-manager -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-webhook -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" | |||
| 1. Apply their **Custom Resource Definitions (CRDs)** to the Karmada control plane and propagate them to the member clusters. | ||
| 2. Apply **Resource Interpreter Customizations** to teach Karmada how to extract per-component resource requirements from these specific workload types. | ||
|
|
||
| > **Note:** We have pre-downloaded the necessary CRDs and placed them in `/root/examples/` for you. |
There was a problem hiding this comment.
| > **Note:** We have pre-downloaded the necessary CRDs and placed them in `/root/examples/` for you. | |
| > **Note:** Karmada has built-in support for interpreting common third-party multi-component workload resources such as FlinkDeployment and VolcanoJob. They define rules for Karmada to parse these resources, covering extraction of replicas and resource requirements of each component, judgment of workload health status and identification of dependent resources. |
| **Apply the Resource Interpreter Customizations:** | ||
|
|
||
| Karmada uses a built-in "Resource Interpreter" to dynamically inspect unfamiliar custom resources. By applying these Lua-based configurations, we teach the interpreter exactly where to look in a `FlinkDeployment` and `VolcanoJob` to find their individual components, replicas, and CPU/Memory requests. | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/flink-interpreter.yaml`{{exec}} | ||
|
|
||
| This applies the Flink Resource Interpreter Customization. | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/volcano-interpreter.yaml`{{exec}} | ||
|
|
||
| This applies the Volcano Resource Interpreter Customization. |
There was a problem hiding this comment.
| **Apply the Resource Interpreter Customizations:** | |
| Karmada uses a built-in "Resource Interpreter" to dynamically inspect unfamiliar custom resources. By applying these Lua-based configurations, we teach the interpreter exactly where to look in a `FlinkDeployment` and `VolcanoJob` to find their individual components, replicas, and CPU/Memory requests. | |
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/flink-interpreter.yaml`{{exec}} | |
| This applies the Flink Resource Interpreter Customization. | |
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/volcano-interpreter.yaml`{{exec}} | |
| This applies the Volcano Resource Interpreter Customization. |
| @@ -0,0 +1,68 @@ | |||
| ### Provide Workload Definitions to Karmada | |||
|
|
|||
| Before Karmada can schedule complex Flink and Volcano workloads, it needs to understand their structure. | |||
There was a problem hiding this comment.
| Before Karmada can schedule complex Flink and Volcano workloads, it needs to understand their structure. | |
| Before Karmada can schedule complex FlinkDeployment workloads, it needs to understand their structure. |
Using FlinkDeployment for the demonstration is sufficient.
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployment-cr.yaml`{{exec}} | ||
|
|
||
| This applies the Flink Custom Resource. |
There was a problem hiding this comment.
| This applies the Flink Custom Resource. | |
| This applies the FlinkDeployment Custom Resource. |
Please use the full name
|
|
||
| </details> | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployment-cr.yaml`{{exec}} |
There was a problem hiding this comment.
A question: why do we need validate=false here?
There was a problem hiding this comment.
FlinkDeployment CRD is not registered on the Karmada API server itself and --validate=false bypasses schema validation on the Karmada API server.
There was a problem hiding this comment.
What exactly are you referring to as "registered"? The -validate=false flag is unnecessary if the steps are followed properly.
|
|
||
| If a multi-cluster scheduler treats these complex jobs as a single generic workload, it may underestimate the total resources required, or accidentally scatter the tightly-coupled components across entirely different geographical clusters, destroying the low-latency communication required for the job to function. | ||
|
|
||
| In this scenario, we will deploy multi-component workloads (Flink and Volcano) and use custom Resource Interpreters to teach Karmada how to extract their individual components. We will then use `SpreadConstraints` to ensure all components of a job are scheduled atomistically to the exact same target cluster. |
There was a problem hiding this comment.
| In this scenario, we will deploy multi-component workloads (Flink and Volcano) and use custom Resource Interpreters to teach Karmada how to extract their individual components. We will then use `SpreadConstraints` to ensure all components of a job are scheduled atomistically to the exact same target cluster. | |
| In this scenario, we will deploy multi-component workloads (FlinkDeployment) and use custom Resource Interpreters to teach Karmada how to extract their individual components. We will then use `SpreadConstraints` to ensure all workload components are scheduled atomistically to the identical target cluster with sufficient resources. |
|
|
||
| When you apply a workload, Karmada uses its Resource Interpreter to analyze the custom resource, extract its requirements, and wrap it into a `ResourceBinding`. Let's inspect this binding to see what Karmada discovered. | ||
|
|
||
| First, extract the dynamic binding name into a variable: |
There was a problem hiding this comment.
We can divide this page into several sections to improve readability:
#### Replicas
#### Resource Requirement
These two sections ensure Karmada can accurately perceive resource demands of multi-component workloads, serving as basis for filtering available clusters.
#### Scheduling Result
Only one cluster will be selected as the scheduling result.5f4c501 to
c6ae27d
Compare
|
|
||
| Multi-component scheduling (`MultiplePodTemplatesScheduling`) is currently an **Alpha** feature in Karmada and is **disabled by default**. We need to explicitly enable it on the `karmada-controller-manager`, `karmada-scheduler`, and `karmada-webhook` components. | ||
|
|
||
| > **Note:** Because these components are running as native Pods on the underlying host cluster, we patch them using the default `kubectl` context, **not** the Karmada API server kubeconfig. We also temporarily change their deployment strategy to `Recreate` to prevent resource deadlocks during the rollout. |
There was a problem hiding this comment.
Thanks for the explanation. However, these three components don't seem to have resource configurations declared. So there shouldn't be deadlocks during their restart, right?
|
|
||
| Multi-component scheduling (`MultiplePodTemplatesScheduling`) is currently an **Alpha** feature in Karmada and is **disabled by default**. We need to explicitly enable it on three core control plane components to ensure the entire scheduling pipeline can process multi-component workloads: | ||
|
|
||
| - **`karmada-webhook`**: Needs the feature gate to successfully validate and mutate the multi-component fields within incoming `ResourceBinding` and `PropagationPolicy` objects. |
There was a problem hiding this comment.
| - **`karmada-webhook`**: Needs the feature gate to successfully validate and mutate the multi-component fields within incoming `ResourceBinding` and `PropagationPolicy` objects. | |
| - **`karmada-webhook`**: Needs the feature gate to successfully validate the multi-component fields within incoming `ResourceBinding` objects. |
|
|
||
| - **`karmada-webhook`**: Needs the feature gate to successfully validate and mutate the multi-component fields within incoming `ResourceBinding` and `PropagationPolicy` objects. | ||
| - **`karmada-controller-manager`**: Requires it to execute custom Resource Interpreters that extract the specific components, and to build the comprehensive `ResourceBinding` that contains them. | ||
| - **`karmada-scheduler`**: Uses it to compute the aggregate resource requirements of all extracted components, ensuring the selected target cluster has sufficient capacity to host the entire complex workload. |
There was a problem hiding this comment.
| - **`karmada-scheduler`**: Uses it to compute the aggregate resource requirements of all extracted components, ensuring the selected target cluster has sufficient capacity to host the entire complex workload. | |
| - **`karmada-scheduler`**: Uses it to obtain the detailed resource requirements of the workload, ensuring the selected target cluster has sufficient capacity to host the entire complex workload. |
|
|
||
| </details> | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployment-cr.yaml`{{exec}} |
There was a problem hiding this comment.
What exactly are you referring to as "registered"? The -validate=false flag is unnecessary if the steps are followed properly.
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployments.flink.apache.org-v1.yaml`{{exec}} | ||
|
|
||
| This applies the Flink CRD. |
There was a problem hiding this comment.
| This applies the Flink CRD. | |
| This applies the FlinkDeployment CRD. |
Please use the full name
|
|
||
| Karmada needs to know exactly how to parse the custom resources to find their component definitions. We provide Lua scripts that teach Karmada how to do this. | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/flink-interpreter.yaml`{{exec}} |
There was a problem hiding this comment.
We have built-in interpreters, why do we need apply the interpreter here?
There was a problem hiding this comment.
You're right, thanks for catching that, I will just fix it
| Let's check if Karmada successfully parsed the `spec.components` array. The array should contain exactly 2 distinct components: | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components | length'`{{exec}} | ||
|
|
||
| This outputs `2`, confirming exactly two distinct components were extracted. | ||
|
|
||
| Check the specific names of the components Karmada identified: | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[].name'`{{exec}} | ||
|
|
||
| This outputs `"jobmanager"` and `"taskmanager"`. |
There was a problem hiding this comment.
The title is replicas, but the content inside has nothing to do with replicas at all!
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployment-cr.yaml`{{exec}} | ||
|
|
||
| This applies the FlinkDeployment Custom Resource. |
There was a problem hiding this comment.
You can add a brief description of this FlinkDeployment CR, like:
This FlinkDeployment includes a JobManager (1 replica, 1 CPU, 100Mi memory) and a TaskManager.
The TaskManager replica count is automatically computed as 1 using ceil(parallelism/numberOfTaskSlots), with resources of 1 CPU and 100Mi memory.
| #### Replicas | ||
|
|
||
| Let's check if Karmada successfully parsed the `spec.components` array. The array should contain exactly 2 distinct components: | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components | length'`{{exec}} | ||
|
|
||
| This outputs `2`, confirming exactly two distinct components were extracted. | ||
|
|
||
| Check the specific names of the components Karmada identified: | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[].name'`{{exec}} | ||
|
|
||
| This outputs `"jobmanager"` and `"taskmanager"`. | ||
|
|
||
| #### Resource Requirement | ||
|
|
||
| The Flink manifest specified `parallelism: 2` and `taskmanager.numberOfTaskSlots: "2"`. Using the Lua interpreter we applied earlier, Karmada correctly calculates that `ceil(2/2) = 1` taskManager replica is needed. Let's verify that Karmada captured this, along with the CPU (1) and memory (100Mi) requests: | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[] | select(.name=="taskmanager") | .replicaRequirements.resourceRequest'`{{exec}} | ||
|
|
||
| This outputs a JSON object with `"cpu": "1"` and `"memory": "100Mi"`. |
There was a problem hiding this comment.
Since the FlinkDeployment CR has been described in Step 7, we can keep this part concise. Just print the components within bindingSpec. If the result matches the expectation from Step 7, it proves that Karmada can parse FlinkDeployment correctly.
| Check that the workload was successfully scheduled by the Karmada control plane: | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.status.conditions[] | select(.type=="Scheduled") | .status'`{{exec}} | ||
|
|
||
| This outputs `"True"`, indicating the workload was successfully scheduled. | ||
|
|
||
| Finally, let's see which cluster it landed on and verify that the Flink components actually exist there: | ||
|
|
||
| RUN `TARGET_CLUSTER=$(kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq -r '.spec.clusters[0].name')`{{exec}} | ||
|
|
||
| This extracts the scheduled target cluster into a variable. | ||
|
|
||
| Verify that the FlinkDeployment exists on the target cluster: | ||
|
|
||
| RUN `kubectl --kubeconfig=$HOME/.kube/config-${TARGET_CLUSTER#kind-} get flinkdeployment -n default`{{exec}} | ||
|
|
||
| This lists the `flinkdeployment-sample` resource, verifying it exists on the target cluster. |
There was a problem hiding this comment.
- print bingSpec.clusters, it only has one target cluster
- Use the command
karmadactl get flinkdeployment --operation-scope membersto verify the flinkdeployment exists on the target cluster
|
|
||
| **Apply the Resource Interpreters:** | ||
|
|
||
| While Karmada's newer versions have built-in support for parsing FlinkDeployment workloads, the version installed in this environment requires us to explicitly provide a Lua script that teaches Karmada how to extract the component definitions. |
There was a problem hiding this comment.
the version installed in this environment requires us to explicitly provide a Lua script that teaches Karmada how to extract the component definitions.
The karmadactl version is v1.17.2, it should have the resource interpter for FlinkDeployment . So why should we explicitly provide a Lua script again?
There was a problem hiding this comment.
I ran into the null components issue during testing , I reverted the changes but I was actually caused by something else , I have fixed it now
|
|
||
| **Apply the CRDs and PropagationPolicy:** | ||
|
|
||
| RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployments.flink.apache.org-v1.yaml`{{exec}} |
There was a problem hiding this comment.
I still have the question: why do we need validate=false here?
| - **`karmada-controller-manager`**: Requires it to execute custom Resource Interpreters that extract the specific components, and to build the comprehensive `ResourceBinding` that contains them. | ||
| - **`karmada-scheduler`**: Uses it to obtain the detailed resource requirements of the workload, ensuring the selected target cluster has sufficient capacity to host the entire complex workload. | ||
|
|
||
| > **Note:** Because these components are running as native Pods on the underlying host cluster, we patch them using the default `kubectl` context, **not** the Karmada API server kubeconfig. We temporarily change their deployment strategy to `Recreate`. This isn't strictly for resource limitations, but to prevent a race condition in this tutorial where the old leader pod processes incoming workloads (with the feature gate disabled) while the new pod is starting up. |
There was a problem hiding this comment.
This isn't strictly for resource limitations, but to prevent a race condition in this tutorial where the old leader pod processes incoming workloads (with the feature gate disabled) while the new pod is starting up.
If you have concerns about this, you can set the corresponding feature gates via command-line arguments when running the command karmadactl init --xxxx to avoid restarts later. This will reduce complex operations and extra explanations down the line.
Run karmadactl init --help to check the usage.
There was a problem hiding this comment.
Thank you for the suggestion to use the karmadactl init flags.
I checked karmadactl init --help and have updated the initialization command in Step 3 to pass the feature gates directly at start time using:
--karmada-controller-manager-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true"
--karmada-scheduler-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true"
--karmada-webhook-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true"
zhzhuang-zju
left a comment
There was a problem hiding this comment.
Thanks @Krishiv-Mahajan, much better
|
|
||
| To achieve this, we must apply their **Custom Resource Definitions (CRDs)** to the Karmada control plane and propagate them to the member clusters. | ||
|
|
||
| > **Note:** Karmada (v1.17+) has built-in support for interpreting FlinkDeployment workloads. It automatically handles extraction of each component's replicas and resource requirements, so no manual Resource Interpreter is needed. |
There was a problem hiding this comment.
| > **Note:** Karmada (v1.17+) has built-in support for interpreting FlinkDeployment workloads. It automatically handles extraction of each component's replicas and resource requirements, so no manual Resource Interpreter is needed. | |
| > **Note:** Karmada has built-in support for interpreting FlinkDeployment workloads. It automatically handles extraction of each component's replicas and resource requirements, so no manual Resource Interpreter is needed. |
Actually, it starts from v1.15. But we can simply remove this info
|
|
||
| If a multi-cluster scheduler treats these complex jobs as a single generic workload, it may underestimate the total resources required, or accidentally scatter the tightly-coupled components across entirely different geographical clusters, destroying the low-latency communication required for the job to function. | ||
|
|
||
| In this scenario, we will deploy a multi-component workload (FlinkDeployment, though VolcanoJob is also fully supported) and use custom Resource Interpreters to teach Karmada how to extract its individual components. We will then use `SpreadConstraints` to ensure all workload components are scheduled atomistically to the identical target cluster with sufficient resources. |
There was a problem hiding this comment.
| In this scenario, we will deploy a multi-component workload (FlinkDeployment, though VolcanoJob is also fully supported) and use custom Resource Interpreters to teach Karmada how to extract its individual components. We will then use `SpreadConstraints` to ensure all workload components are scheduled atomistically to the identical target cluster with sufficient resources. | |
| In this scenario, we will deploy a FlinkDeployment and use a dedicated PropagationPolicy to atomically propagate this multi-component workload to a member cluster with sufficient resources. |
|
|
||
| RUN `karmadactl init --karmada-controller-manager-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true" --karmada-scheduler-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true" --karmada-webhook-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true"`{{exec}} | ||
|
|
||
| This sets up the Karmada control plane on the host cluster with multi-component scheduling enabled on the `karmada-controller-manager`, `karmada-scheduler`, and `karmada-webhook` from the start — no additional patches or restarts needed. |
There was a problem hiding this comment.
The previous explanation of each component's role in MultiplePodTemplatesScheduling works well and can be retained.
| resource: | ||
| cpu: 1 | ||
| memory: 100m | ||
| serviceAccount: flink | ||
| taskManager: | ||
| resource: | ||
| cpu: 1 | ||
| memory: 100m |
There was a problem hiding this comment.
| resource: | |
| cpu: 1 | |
| memory: 100m | |
| serviceAccount: flink | |
| taskManager: | |
| resource: | |
| cpu: 1 | |
| memory: 100m | |
| resource: | |
| cpu: 0.01 | |
| memory: 1m | |
| serviceAccount: flink | |
| taskManager: | |
| resource: | |
| cpu: 0.02 | |
| memory: 2m |
We just fixed a bug that when cluster resources are insufficient, multiple template resources can still be scheduled. So we should lower the resource request.
There was a problem hiding this comment.
Please update the relevant content accordingly as well.
|
|
||
| This FlinkDeployment includes a JobManager (1 replica, 1 CPU, 100Mi memory) and a TaskManager. | ||
| The TaskManager replica count is automatically computed as 1 using `ceil(parallelism/numberOfTaskSlots)`, with resources of 1 CPU and 100Mi memory. | ||
|
|
There was a problem hiding this comment.
We can add a note here:
During scheduling, karmada-scheduler will filter out the cluster with sufficient resources based on node resources and quotas of member clusters. So in this scenario, we set the resource requests of FlinkDeployment as low as possible to ensure successful propagation.
zhzhuang-zju
left a comment
There was a problem hiding this comment.
Thanks, others LGTM
Please make sure to squash your commits after making change to make sure the PR is ready to get merged.
|
|
||
| - **karmada-controller-manager**: Parses the workload using the Resource Interpreter framework and populates the `spec.components` array in the `ResourceBinding` to declare the resource requests of all sub-components. | ||
| - **karmada-scheduler**: Reads the `spec.components` array to calculate the total aggregated resources needed, ensuring the workload is only scheduled to member clusters with sufficient capacity to co-locate all components. | ||
| - **karmada-webhook**: Intercepts the scheduling policies and validates the multi-component configurations. |
There was a problem hiding this comment.
| - **karmada-webhook**: Intercepts the scheduling policies and validates the multi-component configurations. | |
| - **karmada-webhook**: Validates the multi-component fields within incoming `ResourceBinding` objects. |
Signed-off-by: Krishiv-Mahajan <mahajankrishiv10@gmail.com>
part of karmada-io/karmada#7269
This PR introduces a new interactive 10-step Killercoda tutorial scenario designed to demonstrate how Karmada handles complex, multi-component workloads.
Unlike simple stateless deployments, advanced applications (such as Big Data frameworks like Apache Flink or batch schedulers like Volcano) consist of multiple tightly-coupled components with distinct resource profiles (e.g., JobManagers and TaskManagers). This tutorial guides users through the process of teaching Karmada how to accurately interpret these resources and schedule them atomically.
Testing:
This can be tested on: https://killercoda.com/testing-scenario/scenario/karmada-multi-component-workload-example