The Weight Propagation Interface (WPI) is a Kubernetes-native orchestration framework designed to enable high-speed, zero-copy movement of large ML model weights between AI accelerators (GPUs, TPUs) across nodes in a cluster.
As models grow to hundreds of billions of parameters, the traditional path of saving weights to shared storage and independently downloading them into GPU RAM becomes a severe bottleneck. WPI solves this by treating Model Weights as first-class scheduling and hardware resources, leveraging native hardware interconnects (like NVLink and InfiniBand via NCCL) to securely and efficiently distribute weights directly into accelerator memory.
Check out the WPI Demo Video to see the system in action!
WPI consists of three main architectural layers:
- Custom Resource Definitions (CRDs): Define logical blocks of weights (
WeightBuffer) and how workloads bind to them (WeightClaim). Supports automatic model sharding for tensor, pipeline, and expert parallelism. - WPI Operator (The Brain): A Kubernetes controller that reconciles the desired distribution of weights with the cluster's physical topology, including shard discovery and per-claim shard assignment.
- WPI Driver / Node Agent (The Mover): A privileged daemonset running on accelerator nodes that executes hardware-specific commands (CUDA IPC, NCCL) to allocate, share, and transmit memory. Supports both broadcast (1-to-N identical) and scatter (1-to-N sharded) propagation modes.
- Consumer (The Workload): The ML framework (e.g., PyTorch, vLLM) that natively binds to the shared weight memory without allocating a duplicate copy.
operator/: Kubernetes controller for WPI.driver/: Node agent (Python controller and Go-based DRA plugin).proto/: gRPC service definitions (wpi.proto).crds/: Kubernetes Custom Resource manifests.consumer/: Example workloads and pod specifications demonstrating WPI integration.
WPI seamlessly integrates with distributed ML training frameworks to eliminate storage bottlenecks during off-policy training weight synchronization:
- verl: WPI is fully integrated as a
CheckpointEnginebackend forverl. This enables high-throughput, zero-copy weight propagation from RL trainers to rollout workers over RDMA.
Check out the following documents for more details:
- WPI Design Document: Detailed architectural design.
- WPI User Guide: Setup and usage instructions.
Use setup.sh to initialize the environment and teardown.sh to clean up.