Skip to content

Conversation

@KavinKrishnan
Copy link
Contributor

No description provided.

Add complete P2P weight transfer system for GPU-to-GPU model transfers:

## Python Client
- `vllm_loader.py`: Custom vLLM loaders (MxSourceModelLoader, MxTargetModelLoader)
  for transferring FP8 model weights BEFORE processing
- `nixl_transfer.py`: NIXL transfer manager with contiguous region support
- `vllm_extension.py`: vLLM worker extension for ZMQ-based transfers
- `client.py`: CLI client for P2P operations
- `types.py`: TensorDescriptor and WorkerMetadata types

## Rust Server
- `p2p_service.rs`: gRPC service for PublishMetadata/GetMetadata
- `state.rs`: Redis-backed P2P state manager with worker merging

## Kubernetes Examples
- `vllm-source.yaml`: Source vLLM deployment with NIXL ready signal
- `vllm-target.yaml`: Target vLLM deployment with coordination
- `modelexpress-server.yaml`: Server with Redis sidecar
- `Dockerfile.client`: Client image with vLLM loader patches

## Documentation
- `CLAUDE.md`: AI assistant context for the codebase
- `CONTEXT.md`: Detailed engineering context

Key features:
- GPU-to-GPU RDMA transfers via NIXL/UCX
- FP8 model support (DeepSeek-V3, etc.)
- Redis-based source-target coordination
- Baseline mode with individual tensor transfers
- Experimental contiguous region support (MX_CONTIGUOUS_REG)
@github-actions github-actions bot added the feat label Jan 27, 2026
@KavinKrishnan KavinKrishnan marked this pull request as draft January 27, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants