Skip to content

ai-dynamo/grove

Grove

Go Report Card License GitHub Release Discord

Modern AI inference workloads need capabilities that Kubernetes natively doesn't provide out-of-the-box:

  • Scaling for Multi-Node/Multi-Pod Units - Large models may be sharded across multiple nodes, meaning a single model instance spans multiple pods. In this case, the fundamental scaling unit is no longer an individual pod, but an entire group of pods that together form one model instance.
  • Hierarchical Gang scheduling - Multi-node model instances require pods to be scheduled together; if only some of the required pods are placed, the model is unusable, resources remain idle, and the system can deadlock waiting for the remaining pods. Disaggregated inference has similar constraints: at least one prefill instance and one decode instance must be scheduled to form a functional pipeline. Therefore, gang scheduling must occur at multiple levels,ensuring required components start together as an all-or-nothing unit.
  • Startup ordering - Even when components must be scheduled together (e.g., leader and worker pods in a multi-node model instance), there are cases where they must start in a specific order. For example, MPI workloads require all worker pods to be ready before the leader pod launches the application. Explicit startup ordering ensures correct initialization and avoids failures caused by components starting out-of-order.
  • Topology-aware placement - Components in an inference system often communicate heavily between each other. Network optimized placement, e.g. within NVLink domains, is crucial to minimize communication overheads and maximize performance.

Grove is a Kubernetes API that provides a single declarative interface for orchestrating any AI inference workload — from simple, single-pod deployments to complex multi-node, disaggregated systems. Grove lets you scale your multinode inference deployment from a single replica to data center scale, supporting tens of thousands of GPUs. It allows you to describe your whole inference serving system in Kubernetes - e.g. prefill, decode, routing or any other component - as a single Custom Resource (CR). From that one spec, the platform coordinates hierarchical gang scheduling, topology‑aware placement, multi-level autoscaling and explicit startup ordering. You get precise control of how the system behaves without stitching together scripts, YAML files, or custom controllers.

One API. Any inference architecture.

Quick Start on Local Kind Cluster

Get Grove running in 5 minutes:

# 1. Create a local kind cluster
cd operator && make kind-up

# 2. Deploy Grove
make deploy

# 3. Deploy your first workload
kubectl apply -f samples/simple/simple1.yaml

# 4. Fetch the resources created by grove
kubectl get pcs,pclq,pcsg,pg,pod -owide

For a hands-on tour of Grove concepts that you can run entirely on your local machine see Core Concepts Overview

To install in a remote K8s cluster see Installation Docs

How It Works

Grove introduces four simple concepts:

Concept Description
PodClique A group of pods representing a specific role (e.g., leader, worker, frontend). Each clique has an independent configuration and supports custom scaling logic.
PodCliqueScalingGroup A set of PodCliques that scale and are scheduled together as a gang. Ideal for tightly coupled roles like prefill leader and worker.
PodCliqueSet The top-level Grove object that defines a group of components managed and colocated together. Also supports autoscaling with topology aware spread of PodCliqueSet replicas for availability.
PodGang The scheduler API that defines a unit of gang-scheduling. A PodGang is a collection of groups of similar pods, where each pod group defines a minimum number of replicas guaranteed for gang-scheduling.

Hands On Intro to Core Concepts

API Reference

Example Use Cases

  • Multi-Node, Disaggregated Inference for large models (DeepSeek-R1, Llama-4-Maverick) : Visualization
  • Single-Node, Disaggregated Inference : Visualization
  • Agentic Pipeline of Models : Visualization
  • Standard Aggregated Single Node or Single GPU Inference : Visualization

Roadmap

2025 Priorities

Q4 2025

  • Topology-Aware Scheduling
  • Multi-Level Horizontal Auto-Scaling ✅
  • Hierarchical Gang Scheduling ✅
  • Startup Ordering ✅
  • Rolling Updates ✅

Q1 2026

  • Resource-Optimized Rolling Updates
  • Topology Spread Constraints
  • Automatic Topology Detection
  • And More!

Contributions

Please read the contribution guide before creating you first PR!

Community, Discussion, and Support

Grove is an open-source project and we welcome community engagement!

Please feel free to start a discussion thread if you want to discuss a topic of interest.

In case, you have run into any issue or would like a feature enhancement, please create a GitHub Issue with the appropriate tag.

To directly reach out to the Grove user and developer community, please join the NVIDIA Dynamo Discord server, or Grove mailing list.