-
Notifications
You must be signed in to change notification settings - Fork 35
Description
This issue explores integrating Grove with the Kubernetes agent-sandbox project to enable first-class support for AI agent runtimes.
Motivation
The Rise of AI Agents
AI agents—systems that combine LLM inference with tool use, code execution, and environmental interaction—are becoming a dominant deployment pattern. Examples include:
- Coding assistants (Cursor, GitHub Copilot, Claude Code) that execute generated code
- Research agents that browse the web, run analysis scripts, and synthesize findings
- Autonomous agents (AutoGPT, OpenDevin) that iteratively solve tasks
- Enterprise agents that interact with APIs, databases, and internal systems
These agents share a common architecture:
User Request → LLM Inference → Tool/Code Execution → Result → LLM Inference → ...
The Infrastructure Gap
Today, deploying agent runtimes on Kubernetes requires stitching together:
| Component | Current Solution | Problems |
|---|---|---|
| Inference | vLLM/SGLang + manual deployment | No gang scheduling, cold starts |
| Code execution | Firecracker/gVisor/Kata | Separate orchestration |
| Session state | Redis/external DB | Another system to manage |
| Scaling | Custom HPA + glue code | Inference and execution scale independently |
| Isolation | Network policies + namespaces | Coarse-grained, complex setup |
Agent developers want to focus on agent logic, not infrastructure plumbing.
Why Grove?
Grove already solves hard problems for distributed inference:
- Gang scheduling: Ensures all components of a workload start together
- Topology awareness: Co-locates tightly-coupled components for low latency
- Hierarchical scaling: PodCliqueScalingGroups scale related components as a unit
- Dependency ordering:
startsAfterensures correct startup sequences - MinAvailable guarantees: Workloads maintain minimum capacity or terminate gracefully
These primitives map directly to agent runtime requirements:
| Grove Primitive | Agent Runtime Use Case |
|---|---|
| PodClique | Inference backend (GPU pods) |
| PodClique + Sandbox | Code execution environment (isolated) |
| PodCliqueScalingGroup | Inference + execution scaled together |
| Gang scheduling | Low-latency tool calls |
| Topology constraints | Inference and sandbox on same rack/node |
Why agent-sandbox?
The agent-sandbox project (kubernetes-sigs) provides:
- Sandbox CRD: Isolated, stateful, singleton workloads
- SandboxWarmPool: Pre-warmed pool of sandboxes for instant claiming
- SandboxTemplate: Reusable sandbox configurations
- Lifecycle management: Create, delete, pause, resume, hibernate
- Persistent storage: State survives pod restarts
This is exactly what agent runtimes need for code execution environments.
Proposal
The core idea is to enable PodCliqueSets to include sandbox-based workloads alongside inference workloads, with coordinated lifecycle management:
┌─────────────────────────────────────────────────────────────────────┐
│ PodCliqueSet │
│ "agent-runtime" │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ PodCliqueScalingGroup │ │
│ │ ┌──────────────────┐ ┌─────────────────────────────────┐ │ │
│ │ │ PodClique │ │ PodClique │ │ │
│ │ │ "inference" │ │ "execution-sandbox" │ │ │
│ │ │ │ │ │ │ │
│ │ │ ┌────────────┐ │ │ ┌───────────┐ ┌───────────┐ │ │ │
│ │ │ │ vLLM Pod │ │ ───► │ │ Sandbox A │ │ Sandbox B │ │ │ │
│ │ │ │ (GPU) │ │ tool │ │ (user 1) │ │ (user 2) │ │ │ │
│ │ │ └────────────┘ │ call │ └───────────┘ └───────────┘ │ │ │
│ │ │ ┌────────────┐ │ │ ┌───────────┐ ┌───────────┐ │ │ │
│ │ │ │ vLLM Pod │ │ │ │ Sandbox C │ │ Sandbox D │ │ │ │
│ │ │ │ (GPU) │ │ │ │ (user 3) │ │ (user 4) │ │ │ │
│ │ │ └────────────┘ │ │ └───────────┘ └───────────┘ │ │ │
│ │ └──────────────────┘ └─────────────────────────────────┘ │ │
│ │ ▲ ▲ │ │
│ │ │ Gang Scheduled │ │ │
│ │ └──────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ SandboxWarmPool │ │
│ │ "python-execution-pool" │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Warm │ │ Warm │ │ Warm │ │ Warm │ │ Warm │ │ │
│ │ │ Sandbox │ │ Sandbox │ │ Sandbox │ │ Sandbox │ │ Sandbox │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Key capabilities to explore:
- Sandbox-aware PodCliques: PodCliques that manage Sandbox resources instead of raw Pods
- Warm pool integration: Claim pre-warmed sandboxes for instant scale-up
- Hibernate/resume: Pause sandboxes on scale-down instead of deleting, preserving state
- Session affinity: Route user sessions to dedicated sandboxes
- Gang scheduling: Co-locate inference and sandbox pods for low-latency tool calls
Use Cases
1. Coding Assistants (Claude Code, Cursor, etc.)
Agents that execute user-provided or generated code need isolated environments:
- Run tests, install packages, execute scripts
- Persistent environment across conversation turns
- Isolated from other users and from inference infrastructure
2. Multi-Tenant Agent Platforms
Platforms hosting agents for multiple customers:
- Per-tenant or per-user sandbox isolation
- Shared inference infrastructure for cost efficiency
- Independent scaling per tenant
3. Research Agents with Stateful Analysis
Agents performing long-running analysis:
- Download datasets and install tools (persisted)
- Hibernate when idle to reduce costs
- Resume instantly when user returns
Open Questions to Explore
Architecture
-
How should sandbox support be exposed in the Grove API? Options include extending PodClique or introducing a new SandboxClique CRD.
-
How do Sandboxes participate in PodGang scheduling? The agent-sandbox controller would need to coordinate with Grove's scheduling gates.
-
Should Grove manage SandboxWarmPools directly, or reference externally-managed pools?
-
Where does session-to-sandbox routing live? (sidecar, separate service, or Grove operator)
Implementation
-
How to handle clusters without agent-sandbox CRDs installed?
-
How to aggregate Sandbox status into PodClique/PodCliqueSet status?
Security
-
What's the threat model for malicious code execution? Required isolation level (gVisor, Kata, etc.)?
-
How do sandboxes access secrets without exposing them to user code?
Next Steps (H1 2026)
This is an exploratory proposal. In H1 2026, we should:
- Engage with agent-sandbox maintainers to understand their roadmap and discuss integration points
- Gather requirements from agent developers building on Kubernetes today
- Prototype basic integration to validate the architecture
- Document security model for running untrusted code alongside inference
Would love feedback and collaborators interested in this direction.
References
Feedback Requested
I would appreciate feedback on:
- Are we capturing the right agent runtime patterns?
- Ideas for integrating Sandbox with PodGang scheduling?
- Additional security considerations?
- What else do agent developers need from infrastructure?