Weight Propagation Interface (WPI)

The Weight Propagation Interface (WPI) is a Kubernetes-native orchestration framework designed to enable high-speed, zero-copy movement of large ML model weights between AI accelerators (GPUs, TPUs) across nodes in a cluster.

As models grow to hundreds of billions of parameters, the traditional path of saving weights to shared storage and independently downloading them into GPU RAM becomes a severe bottleneck. WPI solves this by treating Model Weights as first-class scheduling and hardware resources, leveraging native hardware interconnects (like NVLink and InfiniBand via NCCL) to securely and efficiently distribute weights directly into accelerator memory.

📺 Demo

Check out the WPI Demo Video to see the system in action!

🏗️ Architecture

WPI consists of three main architectural layers, coordinated to move weights with zero-copy efficiency:

graph TD
    User[Developer/User] -->|Creates| CRD[WeightClaim / WeightBuffer]
    Operator[WPI Operator] -->|Watches| CRD
    Operator -->|Assigns Shards| Driver[WPI Driver / DaemonSet]
    Trainer[Trainer Pod] -->|Writes to| FlatMemory[(CUDA Shared Memory)]
    Driver -->|Maps via CUDA IPC| FlatMemory
    Driver -->|Propagates via NCCL| RemoteDriver[Remote WPI Driver]
    RemoteDriver -->|Maps to| Worker[vLLM / Inference Worker Pod]

🧬 Custom Resource Definitions (CRDs): Define logical blocks of weights (WeightBuffer) and how workloads bind to them (WeightClaim). Supports automatic model sharding for tensor, pipeline, and expert parallelism.
🧠 WPI Operator (The Brain): A Kubernetes controller that reconciles the desired distribution of weights with the cluster's physical topology, including shard discovery and per-claim shard assignment.
🚚 WPI Driver / Node Agent (The Mover): A privileged daemonset running on accelerator nodes that executes hardware-specific commands (CUDA IPC, NCCL) to allocate, share, and transmit memory. Supports both broadcast (1-to-N identical) and scatter (1-to-N sharded) propagation modes.
🤖 Consumer (The Workload): The ML framework (e.g., PyTorch, vLLM) that natively binds to the shared weight memory without allocating a duplicate copy.

📦 Installation

You can install the WPI client library directly from GitHub:

pip install git+https://github.com/llm-d-incubation/weight-propagation-interface.git#subdirectory=consumer/wpi_client

📂 Repository Structure

operator/: Kubernetes controller for WPI.
driver/: Node agent (Python controller and Go-based DRA plugin).
proto/: gRPC service definitions (wpi.proto).
crds/: Kubernetes Custom Resource manifests.
consumer/: Example workloads and pod specifications demonstrating WPI integration.

🔌 Integrations

WPI seamlessly integrates with distributed ML training frameworks to eliminate storage bottlenecks during frequent weight synchronization between training and rollout/inference workers:

verl: WPI is fully integrated as a CheckpointEngine backend for verl. This enables high-throughput, zero-copy weight propagation from RL trainers to rollout workers over RDMA.

📊 Performance

WPI delivers near-line-rate weight propagation by eliminating storage overheads and avoiding CPU staging.

Scenario	Payload Size	Hardware	Throughput
Multi-Node Broadcast	~75 GB	A3 Ultra (InfiniBand)	37.42 GB/s
Multi-Node Broadcast	~14.2 GB	Qwen2-7B (RoCE/NCCL)	~20.4 GB/s
Multi-Node Broadcast	~6 GB	Qwen2.5-3B (RoCE/NCCL)	~15.97 GB/s

🔌 Quick Example

Here is a minimal example of a WeightBuffer and a WeightClaim to bind an inference workload to a shared weight buffer:

apiVersion: wpi.sig.k8s.io/v1alpha1
kind: WeightBuffer
metadata:
  name: vllm-weight-buffer
  namespace: wpi-system
spec:
  capacity: "75Gi"  # 75 GiB — adjust to your model size

---
apiVersion: wpi.sig.k8s.io/v1alpha1
kind: WeightClaim
metadata:
  name: vllm-weight-claim
  namespace: wpi-system
spec:
  weightBufferName: vllm-weight-buffer

🚀 Getting Started

Check out the following documents for more details:

WPI Design Document: Detailed architectural design.
WPI User Guide: Setup and usage instructions.

Use setup.sh to initialize the environment and teardown.sh to clean up.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
consumer		consumer
crds		crds
driver		driver
operator		operator
proto		proto
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
WPI_DESIGN.md		WPI_DESIGN.md
WPI_USER_GUIDE.md		WPI_USER_GUIDE.md
build_plugin.sh		build_plugin.sh
deploy_sharding_update.sh		deploy_sharding_update.sh
setup.sh		setup.sh
teardown.sh		teardown.sh
working-doc.md		working-doc.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weight Propagation Interface (WPI)

📺 Demo

🏗️ Architecture

📦 Installation

📂 Repository Structure

🔌 Integrations

📊 Performance

🔌 Quick Example

🚀 Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Weight Propagation Interface (WPI)

📺 Demo

🏗️ Architecture

📦 Installation

📂 Repository Structure

🔌 Integrations

📊 Performance

🔌 Quick Example

🚀 Getting Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages