Modern AI inference workloads need capabilities that Kubernetes natively doesn't provide out-of-the-box:
- Scaling for Multi-Node/Multi-Pod Units - Large models may be sharded across multiple nodes, meaning a single model instance spans multiple pods. In this case, the fundamental scaling unit is no longer an individual pod, but an entire group of pods that together form one model instance.
- Hierarchical Gang scheduling - Multi-node model instances require pods to be scheduled together; if only some of the required pods are placed, the model is unusable, resources remain idle, and the system can deadlock waiting for the remaining pods. Disaggregated inference has similar constraints: at least one prefill instance and one decode instance must be scheduled to form a functional pipeline. Therefore, gang scheduling must occur at multiple levels,ensuring required components start together as an all-or-nothing unit.
- Startup ordering - Even when components must be scheduled together (e.g., leader and worker pods in a multi-node model instance), there are cases where they must start in a specific order. For example, MPI workloads require all worker pods to be ready before the leader pod launches the application. Explicit startup ordering ensures correct initialization and avoids failures caused by components starting out-of-order.
- Topology-aware placement - Components in an inference system often communicate heavily between each other. Network optimized placement, e.g. within NVLink domains, is crucial to minimize communication overheads and maximize performance.
Grove is a Kubernetes API that provides a single declarative interface for orchestrating any AI inference workload — from simple, single-pod deployments to complex multi-node, disaggregated systems. Grove lets you scale your multinode inference deployment from a single replica to data center scale, supporting tens of thousands of GPUs. It allows you to describe your whole inference serving system in Kubernetes - e.g. prefill, decode, routing or any other component - as a single Custom Resource (CR). From that one spec, the platform coordinates hierarchical gang scheduling, topology‑aware placement, multi-level autoscaling and explicit startup ordering. You get precise control of how the system behaves without stitching together scripts, YAML files, or custom controllers.
One API. Any inference architecture.
Get Grove running in 5 minutes:
# 1. Create a local kind cluster
cd operator && make kind-up
# 2. Deploy Grove
make deploy
# 3. Deploy your first workload
kubectl apply -f samples/simple/simple1.yaml
# 4. Fetch the resources created by grove
kubectl get pcs,pclq,pcsg,pg,pod -owideFor a hands-on tour of Grove concepts that you can run entirely on your local machine see → Core Concepts Overview
To install in a remote K8s cluster see → Installation Docs
Grove introduces four simple concepts:
| Concept | Description |
|---|---|
| PodClique | A group of pods representing a specific role (e.g., leader, worker, frontend). Each clique has an independent configuration and supports custom scaling logic. |
| PodCliqueScalingGroup | A set of PodCliques that scale and are scheduled together as a gang. Ideal for tightly coupled roles like prefill leader and worker. |
| PodCliqueSet | The top-level Grove object that defines a group of components managed and colocated together. Also supports autoscaling with topology aware spread of PodCliqueSet replicas for availability. |
| PodGang | The scheduler API that defines a unit of gang-scheduling. A PodGang is a collection of groups of similar pods, where each pod group defines a minimum number of replicas guaranteed for gang-scheduling. |
→ Hands On Intro to Core Concepts
- Multi-Node, Disaggregated Inference for large models (DeepSeek-R1, Llama-4-Maverick) : Visualization
- Single-Node, Disaggregated Inference : Visualization
- Agentic Pipeline of Models : Visualization
- Standard Aggregated Single Node or Single GPU Inference : Visualization
Q4 2025
- Topology-Aware Scheduling
- Multi-Level Horizontal Auto-Scaling ✅
- Hierarchical Gang Scheduling ✅
- Startup Ordering ✅
- Rolling Updates ✅
Q1 2026
- Resource-Optimized Rolling Updates
- Topology Spread Constraints
- Automatic Topology Detection
- And More!
Please read the contribution guide before creating you first PR!
Grove is an open-source project and we welcome community engagement!
Please feel free to start a discussion thread if you want to discuss a topic of interest.
In case, you have run into any issue or would like a feature enhancement, please create a GitHub Issue with the appropriate tag.
To directly reach out to the Grove user and developer community, please join the NVIDIA Dynamo Discord server, or Grove mailing list.