Skip to content

Commit 1b800b0

Browse files
authored
Merge pull request #4917 from Azure/pavneeta-patch-3
Update 2025-04-02-Scaling-Kubernetes-for-AI-and-Data-intensive-Worklo…
2 parents 5916966 + 96287be commit 1b800b0

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

blog/_posts/2025-04-02-Scaling-Kubernetes-for-AI-and-Data-intensive-Workloads.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ With the fast-paced advancement of AI workloads, building and fine-tuning of mul
1717

1818
While you can opt to scale your node pools out within a single cluster, there are some challenges that you might encounter, including but not limited to Kubernetes control plane scaling limits (e.g., kube-apiserver bottlenecks, etcd performance, pod and container limits) and even cloud providers' subscription, region, and/or resource limits.
1919

20-
That is why here at AKS, we believe taking a different approach might be worth exploring. Rather than scaling out to tens of thousands of nodes within a single cluster, we think scaling out to tens or even hundreds of clusters may be a more efficient approach, especially when leveraging the [AKS Fleet Manager feature](https://learn.microsoft.com/azure/kubernetes-fleet/overview) which was first announced in October 2022 and is powered by the [KubeFleet](https://kfleet.io/) project recently donated to the CNCF Sandbox.
20+
That is why here at AKS, we believe taking a different approach might be worth exploring. Rather than scaling out to tens of thousands of nodes within a single cluster, we think scaling out to tens or even hundreds of clusters may be a more efficient approach, especially when leveraging the [AKS Fleet Manager feature](https://learn.microsoft.com/azure/kubernetes-fleet/overview) which was first announced in October 2022 and is powered by the [KubeFleet](https://kfleet.io/) project recently donated to the CNCF Sandbox. The reason people often didn't want to approach it this (multi-cluster) way is because they saw more clusters as more operational burden, with Azure Kubernetes Fleet Manager and AKS, that's no different than more nodes in a cluster.
2121

2222
With AKS Fleet Manager, you can unlock true limitless scalability by leveraging its ability to aggregate numerous AKS clusters for vast node provisioning tailored to your extensive AI training/serving and Data processing needs.
2323
- **Limitless Scalability**: By grouping numerous AKS clusters into a single fleet, we enable practically limitless scalability. Need 100,000 nodes for your AI training tasks? Azure Kubernetes Fleet Manager makes this achievable.

0 commit comments

Comments
 (0)