Skip to content

multi-component-workload tutorial creation #29

Open
Krishiv-Mahajan wants to merge 1 commit into
karmada-io:mainfrom
Krishiv-Mahajan:big-data
Open

multi-component-workload tutorial creation #29
Krishiv-Mahajan wants to merge 1 commit into
karmada-io:mainfrom
Krishiv-Mahajan:big-data

Conversation

@Krishiv-Mahajan
Copy link
Copy Markdown
Contributor

@Krishiv-Mahajan Krishiv-Mahajan commented May 23, 2026

part of karmada-io/karmada#7269

This PR introduces a new interactive 10-step Killercoda tutorial scenario designed to demonstrate how Karmada handles complex, multi-component workloads.
Unlike simple stateless deployments, advanced applications (such as Big Data frameworks like Apache Flink or batch schedulers like Volcano) consist of multiple tightly-coupled components with distinct resource profiles (e.g., JobManagers and TaskManagers). This tutorial guides users through the process of teaching Karmada how to accurately interpret these resources and schedule them atomically.

Testing:

This can be tested on: https://killercoda.com/testing-scenario/scenario/karmada-multi-component-workload-example

@karmada-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rainbowmango for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Copy Markdown

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@karmada-bot karmada-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 23, 2026
@Krishiv-Mahajan
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive tutorial for scheduling multi-component workloads, such as Apache Flink and Volcano, across multiple clusters using Karmada. The changes include environment setup scripts, tutorial documentation, and verification logic. Feedback from the review identifies several critical issues, including a logic bug in the Flink Lua interpreter where JobManager memory was incorrectly referenced for TaskManagers, and a potential kubeconfig corruption issue during cluster creation. Additionally, the reviewer pointed out incorrect memory units in the Flink manifest, inconsistencies between the code and tutorial text, and missing verification checks for the scheduler component.

local tm_cpu = get(observedObj, {"spec","taskManager","resource","cpu"})
local tm_memory = get(observedObj, {"spec","taskManager","resource","memory"})
if tm_cpu ~= nil then tm_requires.resourceRequest.cpu = tm_cpu end
if tm_memory ~= nil then tm_requires.resourceRequest.memory = kube.getResourceQuantity(tm_memory) end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a bug here: jm_memory is used instead of tm_memory. Additionally, removing kube.getResourceQuantity() will keep the output consistent with the tutorial text and the Volcano example.

Suggested change
if tm_memory ~= nil then tm_requires.resourceRequest.memory = kube.getResourceQuantity(tm_memory) end
if tm_memory ~= nil then tm_requires.resourceRequest.memory = tm_memory end

mv $HOME/.kube/config ~/config-member1
kind create cluster --name=member2 --config=cluster2.yaml
mv $HOME/.kube/config config-member2
KUBECONFIG=~/config-member1:~/config-member2 kubectl config view --merge --flatten >> ${KUBECONFIG_PATH}/config
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using >> to append the output of kubectl config view --flatten to an existing kubeconfig file will result in an invalid YAML file if the destination already contains data (e.g., the default config in a Killercoda environment). It is better to overwrite the file or use a proper merge strategy.

Suggested change
KUBECONFIG=~/config-member1:~/config-member2 kubectl config view --merge --flatten >> ${KUBECONFIG_PATH}/config
KUBECONFIG=~/config-member1:~/config-member2 kubectl config view --merge --flatten > ${KUBECONFIG_PATH}/config

local jm_cpu = get(observedObj, {"spec","jobManager","resource","cpu"})
local jm_memory = get(observedObj, {"spec","jobManager","resource","memory"})
if jm_cpu ~= nil then jm_requires.resourceRequest.cpu = jm_cpu end
if jm_memory ~= nil then jm_requires.resourceRequest.memory = kube.getResourceQuantity(jm_memory) end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the Volcano example and to match the expected output in the tutorial steps, it is better to avoid using kube.getResourceQuantity() for memory. This ensures the original string (e.g., "100Mi") is preserved in the ResourceBinding.

Suggested change
if jm_memory ~= nil then jm_requires.resourceRequest.memory = kube.getResourceQuantity(jm_memory) end
if jm_memory ~= nil then jm_requires.resourceRequest.memory = jm_memory end

replicas: 1
resource:
cpu: 1
memory: 100m
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The memory quantity 100m is likely a typo for 100Mi. In Kubernetes resource quantities, m suffix for memory represents milli-bytes (0.1 bytes), which is likely not intended and will cause scheduling issues.

Suggested change
memory: 100m
memory: 100Mi

taskManager:
resource:
cpu: 1
memory: 100m
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The memory quantity 100m should be corrected to 100Mi for consistency and correctness.

Suggested change
memory: 100m
memory: 100Mi

@@ -0,0 +1,3 @@
#!/bin/bash

kubectl -n karmada-system get deployment karmada-controller-manager -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-webhook -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The verification script should also check the karmada-scheduler deployment, as it was also patched in the instructions to enable the MultiplePodTemplatesScheduling feature gate.

Suggested change
kubectl -n karmada-system get deployment karmada-controller-manager -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-webhook -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true"
kubectl -n karmada-system get deployment karmada-controller-manager -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-scheduler -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-webhook -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Comment on lines +25 to +29
The Flink manifest specified `parallelism: 2` and `taskmanager.numberOfTaskSlots: "2"`. Using the Lua interpreter we applied earlier, Karmada correctly calculates that `ceil(2/2) = 1` taskManager replica is needed. Let's verify that Karmada captured this, along with the CPU (1) and memory (100m) requests:

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[] | select(.name=="taskmanager") | .replicaRequirements.resourceRequest'`{{exec}}

This outputs a JSON object with `"cpu": "1"` and `"memory": "100m"`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The expected memory value in the tutorial text should be updated to 100Mi to be consistent with the fix in the manifest and the actual resource requirements.

Suggested change
The Flink manifest specified `parallelism: 2` and `taskmanager.numberOfTaskSlots: "2"`. Using the Lua interpreter we applied earlier, Karmada correctly calculates that `ceil(2/2) = 1` taskManager replica is needed. Let's verify that Karmada captured this, along with the CPU (1) and memory (100m) requests:
RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[] | select(.name=="taskmanager") | .replicaRequirements.resourceRequest'`{{exec}}
This outputs a JSON object with `"cpu": "1"` and `"memory": "100m"`.
The Flink manifest specified `parallelism: 2` and `taskmanager.numberOfTaskSlots: "2"`. Using the Lua interpreter we applied earlier, Karmada correctly calculates that `ceil(2/2) = 1` taskManager replica is needed. Let's verify that Karmada captured this, along with the CPU (1) and memory (100Mi) requests:\n\nRUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[] | select(.name=="taskmanager") | .replicaRequirements.resourceRequest'`{{exec}}\n\nThis outputs a JSON object with "cpu": "1" and "memory": "100Mi".


Multi-component scheduling (`MultiplePodTemplatesScheduling`) is currently an **Alpha** feature in Karmada and is **disabled by default**. We need to explicitly enable it on the `karmada-controller-manager`, `karmada-scheduler`, and `karmada-webhook` components.

> **Note:** Because these components are running as native Pods on the underlying host cluster, we patch them using the default `kubectl` context, **not** the Karmada API server kubeconfig. We also temporarily change their deployment strategy to `Recreate` to prevent resource deadlocks during the rollout.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also temporarily change their deployment strategy to Recreate to prevent resource deadlocks during the rollout.

Can you elaborate more on this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a bit more detail in the latest commit

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. However, these three components don't seem to have resource configurations declared. So there shouldn't be deadlocks during their restart, right?

@@ -0,0 +1,3 @@
#!/bin/bash

kubectl -n karmada-system get deployment karmada-controller-manager -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true" && kubectl -n karmada-system get deployment karmada-webhook -o json | jq -r '.spec.template.spec.containers[0].command[]' | grep -q "MultiplePodTemplatesScheduling=true"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

1. Apply their **Custom Resource Definitions (CRDs)** to the Karmada control plane and propagate them to the member clusters.
2. Apply **Resource Interpreter Customizations** to teach Karmada how to extract per-component resource requirements from these specific workload types.

> **Note:** We have pre-downloaded the necessary CRDs and placed them in `/root/examples/` for you.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> **Note:** We have pre-downloaded the necessary CRDs and placed them in `/root/examples/` for you.
> **Note:** Karmada has built-in support for interpreting common third-party multi-component workload resources such as FlinkDeployment and VolcanoJob. They define rules for Karmada to parse these resources, covering extraction of replicas and resource requirements of each component, judgment of workload health status and identification of dependent resources.

Comment on lines +50 to +60
**Apply the Resource Interpreter Customizations:**

Karmada uses a built-in "Resource Interpreter" to dynamically inspect unfamiliar custom resources. By applying these Lua-based configurations, we teach the interpreter exactly where to look in a `FlinkDeployment` and `VolcanoJob` to find their individual components, replicas, and CPU/Memory requests.

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/flink-interpreter.yaml`{{exec}}

This applies the Flink Resource Interpreter Customization.

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/volcano-interpreter.yaml`{{exec}}

This applies the Volcano Resource Interpreter Customization.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Apply the Resource Interpreter Customizations:**
Karmada uses a built-in "Resource Interpreter" to dynamically inspect unfamiliar custom resources. By applying these Lua-based configurations, we teach the interpreter exactly where to look in a `FlinkDeployment` and `VolcanoJob` to find their individual components, replicas, and CPU/Memory requests.
RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/flink-interpreter.yaml`{{exec}}
This applies the Flink Resource Interpreter Customization.
RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/volcano-interpreter.yaml`{{exec}}
This applies the Volcano Resource Interpreter Customization.

@@ -0,0 +1,68 @@
### Provide Workload Definitions to Karmada

Before Karmada can schedule complex Flink and Volcano workloads, it needs to understand their structure.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Before Karmada can schedule complex Flink and Volcano workloads, it needs to understand their structure.
Before Karmada can schedule complex FlinkDeployment workloads, it needs to understand their structure.

Using FlinkDeployment for the demonstration is sufficient.


RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployment-cr.yaml`{{exec}}

This applies the Flink Custom Resource.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This applies the Flink Custom Resource.
This applies the FlinkDeployment Custom Resource.

Please use the full name


</details>

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployment-cr.yaml`{{exec}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question: why do we need validate=false here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlinkDeployment CRD is not registered on the Karmada API server itself and --validate=false bypasses schema validation on the Karmada API server.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly are you referring to as "registered"? The -validate=false flag is unnecessary if the steps are followed properly.


If a multi-cluster scheduler treats these complex jobs as a single generic workload, it may underestimate the total resources required, or accidentally scatter the tightly-coupled components across entirely different geographical clusters, destroying the low-latency communication required for the job to function.

In this scenario, we will deploy multi-component workloads (Flink and Volcano) and use custom Resource Interpreters to teach Karmada how to extract their individual components. We will then use `SpreadConstraints` to ensure all components of a job are scheduled atomistically to the exact same target cluster.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this scenario, we will deploy multi-component workloads (Flink and Volcano) and use custom Resource Interpreters to teach Karmada how to extract their individual components. We will then use `SpreadConstraints` to ensure all components of a job are scheduled atomistically to the exact same target cluster.
In this scenario, we will deploy multi-component workloads (FlinkDeployment) and use custom Resource Interpreters to teach Karmada how to extract their individual components. We will then use `SpreadConstraints` to ensure all workload components are scheduled atomistically to the identical target cluster with sufficient resources.


When you apply a workload, Karmada uses its Resource Interpreter to analyze the custom resource, extract its requirements, and wrap it into a `ResourceBinding`. Let's inspect this binding to see what Karmada discovered.

First, extract the dynamic binding name into a variable:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can divide this page into several sections to improve readability:

#### Replicas
#### Resource Requirement

These two sections ensure Karmada can accurately perceive resource demands of multi-component workloads, serving as basis for filtering available clusters.

#### Scheduling Result
Only one cluster will be selected as the scheduling result.

@Krishiv-Mahajan Krishiv-Mahajan force-pushed the big-data branch 2 times, most recently from 5f4c501 to c6ae27d Compare May 25, 2026 16:37

Multi-component scheduling (`MultiplePodTemplatesScheduling`) is currently an **Alpha** feature in Karmada and is **disabled by default**. We need to explicitly enable it on the `karmada-controller-manager`, `karmada-scheduler`, and `karmada-webhook` components.

> **Note:** Because these components are running as native Pods on the underlying host cluster, we patch them using the default `kubectl` context, **not** the Karmada API server kubeconfig. We also temporarily change their deployment strategy to `Recreate` to prevent resource deadlocks during the rollout.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. However, these three components don't seem to have resource configurations declared. So there shouldn't be deadlocks during their restart, right?


Multi-component scheduling (`MultiplePodTemplatesScheduling`) is currently an **Alpha** feature in Karmada and is **disabled by default**. We need to explicitly enable it on three core control plane components to ensure the entire scheduling pipeline can process multi-component workloads:

- **`karmada-webhook`**: Needs the feature gate to successfully validate and mutate the multi-component fields within incoming `ResourceBinding` and `PropagationPolicy` objects.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **`karmada-webhook`**: Needs the feature gate to successfully validate and mutate the multi-component fields within incoming `ResourceBinding` and `PropagationPolicy` objects.
- **`karmada-webhook`**: Needs the feature gate to successfully validate the multi-component fields within incoming `ResourceBinding` objects.


- **`karmada-webhook`**: Needs the feature gate to successfully validate and mutate the multi-component fields within incoming `ResourceBinding` and `PropagationPolicy` objects.
- **`karmada-controller-manager`**: Requires it to execute custom Resource Interpreters that extract the specific components, and to build the comprehensive `ResourceBinding` that contains them.
- **`karmada-scheduler`**: Uses it to compute the aggregate resource requirements of all extracted components, ensuring the selected target cluster has sufficient capacity to host the entire complex workload.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **`karmada-scheduler`**: Uses it to compute the aggregate resource requirements of all extracted components, ensuring the selected target cluster has sufficient capacity to host the entire complex workload.
- **`karmada-scheduler`**: Uses it to obtain the detailed resource requirements of the workload, ensuring the selected target cluster has sufficient capacity to host the entire complex workload.


</details>

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployment-cr.yaml`{{exec}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly are you referring to as "registered"? The -validate=false flag is unnecessary if the steps are followed properly.


RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployments.flink.apache.org-v1.yaml`{{exec}}

This applies the Flink CRD.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This applies the Flink CRD.
This applies the FlinkDeployment CRD.

Please use the full name


Karmada needs to know exactly how to parse the custom resources to find their component definitions. We provide Lua scripts that teach Karmada how to do this.

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply -f /root/examples/flink-interpreter.yaml`{{exec}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have built-in interpreters, why do we need apply the interpreter here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, thanks for catching that, I will just fix it

Comment on lines +11 to +21
Let's check if Karmada successfully parsed the `spec.components` array. The array should contain exactly 2 distinct components:

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components | length'`{{exec}}

This outputs `2`, confirming exactly two distinct components were extracted.

Check the specific names of the components Karmada identified:

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[].name'`{{exec}}

This outputs `"jobmanager"` and `"taskmanager"`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title is replicas, but the content inside has nothing to do with replicas at all!


RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployment-cr.yaml`{{exec}}

This applies the FlinkDeployment Custom Resource.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a brief description of this FlinkDeployment CR, like:

This FlinkDeployment includes a JobManager (1 replica, 1 CPU, 100Mi memory) and a TaskManager.
The TaskManager replica count is automatically computed as 1 using ceil(parallelism/numberOfTaskSlots), with resources of 1 CPU and 100Mi memory.

Comment on lines +9 to +29
#### Replicas

Let's check if Karmada successfully parsed the `spec.components` array. The array should contain exactly 2 distinct components:

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components | length'`{{exec}}

This outputs `2`, confirming exactly two distinct components were extracted.

Check the specific names of the components Karmada identified:

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[].name'`{{exec}}

This outputs `"jobmanager"` and `"taskmanager"`.

#### Resource Requirement

The Flink manifest specified `parallelism: 2` and `taskmanager.numberOfTaskSlots: "2"`. Using the Lua interpreter we applied earlier, Karmada correctly calculates that `ceil(2/2) = 1` taskManager replica is needed. Let's verify that Karmada captured this, along with the CPU (1) and memory (100Mi) requests:

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.spec.components[] | select(.name=="taskmanager") | .replicaRequirements.resourceRequest'`{{exec}}

This outputs a JSON object with `"cpu": "1"` and `"memory": "100Mi"`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the FlinkDeployment CR has been described in Step 7, we can keep this part concise. Just print the components within bindingSpec. If the result matches the expectation from Step 7, it proves that Karmada can parse FlinkDeployment correctly.

Comment on lines +33 to +49
Check that the workload was successfully scheduled by the Karmada control plane:

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq '.status.conditions[] | select(.type=="Scheduled") | .status'`{{exec}}

This outputs `"True"`, indicating the workload was successfully scheduled.

Finally, let's see which cluster it landed on and verify that the Flink components actually exist there:

RUN `TARGET_CLUSTER=$(kubectl --kubeconfig /etc/karmada/karmada-apiserver.config get resourcebinding $BINDING_NAME -n default -o json | jq -r '.spec.clusters[0].name')`{{exec}}

This extracts the scheduled target cluster into a variable.

Verify that the FlinkDeployment exists on the target cluster:

RUN `kubectl --kubeconfig=$HOME/.kube/config-${TARGET_CLUSTER#kind-} get flinkdeployment -n default`{{exec}}

This lists the `flinkdeployment-sample` resource, verifying it exists on the target cluster.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. print bingSpec.clusters, it only has one target cluster
  2. Use the command karmadactl get flinkdeployment --operation-scope members to verify the flinkdeployment exists on the target cluster

Copy link
Copy Markdown
Contributor

@zhzhuang-zju zhzhuang-zju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks


**Apply the Resource Interpreters:**

While Karmada's newer versions have built-in support for parsing FlinkDeployment workloads, the version installed in this environment requires us to explicitly provide a Lua script that teaches Karmada how to extract the component definitions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the version installed in this environment requires us to explicitly provide a Lua script that teaches Karmada how to extract the component definitions.

The karmadactl version is v1.17.2, it should have the resource interpter for FlinkDeployment . So why should we explicitly provide a Lua script again?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into the null components issue during testing , I reverted the changes but I was actually caused by something else , I have fixed it now


**Apply the CRDs and PropagationPolicy:**

RUN `kubectl --kubeconfig /etc/karmada/karmada-apiserver.config apply --validate=false -f /root/examples/flinkdeployments.flink.apache.org-v1.yaml`{{exec}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have the question: why do we need validate=false here?

- **`karmada-controller-manager`**: Requires it to execute custom Resource Interpreters that extract the specific components, and to build the comprehensive `ResourceBinding` that contains them.
- **`karmada-scheduler`**: Uses it to obtain the detailed resource requirements of the workload, ensuring the selected target cluster has sufficient capacity to host the entire complex workload.

> **Note:** Because these components are running as native Pods on the underlying host cluster, we patch them using the default `kubectl` context, **not** the Karmada API server kubeconfig. We temporarily change their deployment strategy to `Recreate`. This isn't strictly for resource limitations, but to prevent a race condition in this tutorial where the old leader pod processes incoming workloads (with the feature gate disabled) while the new pod is starting up.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly for resource limitations, but to prevent a race condition in this tutorial where the old leader pod processes incoming workloads (with the feature gate disabled) while the new pod is starting up.

If you have concerns about this, you can set the corresponding feature gates via command-line arguments when running the command karmadactl init --xxxx to avoid restarts later. This will reduce complex operations and extra explanations down the line.

Run karmadactl init --help to check the usage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion to use the karmadactl init flags.

I checked karmadactl init --help and have updated the initialization command in Step 3 to pass the feature gates directly at start time using:

--karmada-controller-manager-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true"
--karmada-scheduler-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true"
--karmada-webhook-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true"

Copy link
Copy Markdown
Contributor

@zhzhuang-zju zhzhuang-zju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Krishiv-Mahajan, much better


To achieve this, we must apply their **Custom Resource Definitions (CRDs)** to the Karmada control plane and propagate them to the member clusters.

> **Note:** Karmada (v1.17+) has built-in support for interpreting FlinkDeployment workloads. It automatically handles extraction of each component's replicas and resource requirements, so no manual Resource Interpreter is needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> **Note:** Karmada (v1.17+) has built-in support for interpreting FlinkDeployment workloads. It automatically handles extraction of each component's replicas and resource requirements, so no manual Resource Interpreter is needed.
> **Note:** Karmada has built-in support for interpreting FlinkDeployment workloads. It automatically handles extraction of each component's replicas and resource requirements, so no manual Resource Interpreter is needed.

Actually, it starts from v1.15. But we can simply remove this info


If a multi-cluster scheduler treats these complex jobs as a single generic workload, it may underestimate the total resources required, or accidentally scatter the tightly-coupled components across entirely different geographical clusters, destroying the low-latency communication required for the job to function.

In this scenario, we will deploy a multi-component workload (FlinkDeployment, though VolcanoJob is also fully supported) and use custom Resource Interpreters to teach Karmada how to extract its individual components. We will then use `SpreadConstraints` to ensure all workload components are scheduled atomistically to the identical target cluster with sufficient resources.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this scenario, we will deploy a multi-component workload (FlinkDeployment, though VolcanoJob is also fully supported) and use custom Resource Interpreters to teach Karmada how to extract its individual components. We will then use `SpreadConstraints` to ensure all workload components are scheduled atomistically to the identical target cluster with sufficient resources.
In this scenario, we will deploy a FlinkDeployment and use a dedicated PropagationPolicy to atomically propagate this multi-component workload to a member cluster with sufficient resources.


RUN `karmadactl init --karmada-controller-manager-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true" --karmada-scheduler-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true" --karmada-webhook-extra-args="--feature-gates=MultiplePodTemplatesScheduling=true"`{{exec}}

This sets up the Karmada control plane on the host cluster with multi-component scheduling enabled on the `karmada-controller-manager`, `karmada-scheduler`, and `karmada-webhook` from the start — no additional patches or restarts needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous explanation of each component's role in MultiplePodTemplatesScheduling works well and can be retained.

Comment on lines +28 to +35
resource:
cpu: 1
memory: 100m
serviceAccount: flink
taskManager:
resource:
cpu: 1
memory: 100m
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resource:
cpu: 1
memory: 100m
serviceAccount: flink
taskManager:
resource:
cpu: 1
memory: 100m
resource:
cpu: 0.01
memory: 1m
serviceAccount: flink
taskManager:
resource:
cpu: 0.02
memory: 2m

We just fixed a bug that when cluster resources are insufficient, multiple template resources can still be scheduled. So we should lower the resource request.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the relevant content accordingly as well.


This FlinkDeployment includes a JobManager (1 replica, 1 CPU, 100Mi memory) and a TaskManager.
The TaskManager replica count is automatically computed as 1 using `ceil(parallelism/numberOfTaskSlots)`, with resources of 1 CPU and 100Mi memory.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a note here:

During scheduling, karmada-scheduler will filter out the cluster with sufficient resources based on node resources and quotas of member clusters. So in this scenario, we set the resource requests of FlinkDeployment as low as possible to ensure successful propagation.

Copy link
Copy Markdown
Contributor

@zhzhuang-zju zhzhuang-zju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, others LGTM

Please make sure to squash your commits after making change to make sure the PR is ready to get merged.


- **karmada-controller-manager**: Parses the workload using the Resource Interpreter framework and populates the `spec.components` array in the `ResourceBinding` to declare the resource requests of all sub-components.
- **karmada-scheduler**: Reads the `spec.components` array to calculate the total aggregated resources needed, ensuring the workload is only scheduled to member clusters with sufficient capacity to co-locate all components.
- **karmada-webhook**: Intercepts the scheduling policies and validates the multi-component configurations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **karmada-webhook**: Intercepts the scheduling policies and validates the multi-component configurations.
- **karmada-webhook**: Validates the multi-component fields within incoming `ResourceBinding` objects.

Signed-off-by: Krishiv-Mahajan <mahajankrishiv10@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants