[Roadmap Exploration]: Integrating Agent Sandboxes in Grove

This issue explores integrating Grove with the Kubernetes [agent-sandbox](https://github.com/kubernetes-sigs/agent-sandbox) project to enable first-class support for AI agent runtimes.

## Motivation

### The Rise of AI Agents

AI agents—systems that combine LLM inference with tool use, code execution, and environmental interaction—are becoming a dominant deployment pattern. Examples include:

- **Coding assistants** (Cursor, GitHub Copilot, Claude Code) that execute generated code
- **Research agents** that browse the web, run analysis scripts, and synthesize findings
- **Autonomous agents** (AutoGPT, OpenDevin) that iteratively solve tasks
- **Enterprise agents** that interact with APIs, databases, and internal systems

These agents share a common architecture:

```
User Request → LLM Inference → Tool/Code Execution → Result → LLM Inference → ...
```

### The Infrastructure Gap

Today, deploying agent runtimes on Kubernetes requires stitching together:

| Component | Current Solution | Problems |
|-----------|-----------------|----------|
| Inference | vLLM/SGLang + manual deployment | No gang scheduling, cold starts |
| Code execution | Firecracker/gVisor/Kata | Separate orchestration |
| Session state | Redis/external DB | Another system to manage |
| Scaling | Custom HPA + glue code | Inference and execution scale independently |
| Isolation | Network policies + namespaces | Coarse-grained, complex setup |

Agent developers want to focus on agent logic, not infrastructure plumbing.

### Why Grove?

Grove already solves hard problems for distributed inference:

- **Gang scheduling**: Ensures all components of a workload start together
- **Topology awareness**: Co-locates tightly-coupled components for low latency
- **Hierarchical scaling**: PodCliqueScalingGroups scale related components as a unit
- **Dependency ordering**: `startsAfter` ensures correct startup sequences
- **MinAvailable guarantees**: Workloads maintain minimum capacity or terminate gracefully

These primitives map directly to agent runtime requirements:

| Grove Primitive | Agent Runtime Use Case |
|-----------------|----------------------|
| PodClique | Inference backend (GPU pods) |
| PodClique + Sandbox | Code execution environment (isolated) |
| PodCliqueScalingGroup | Inference + execution scaled together |
| Gang scheduling | Low-latency tool calls |
| Topology constraints | Inference and sandbox on same rack/node |

### Why agent-sandbox?

The [agent-sandbox](https://github.com/kubernetes-sigs/agent-sandbox) project (kubernetes-sigs) provides:

- **Sandbox CRD**: Isolated, stateful, singleton workloads
- **SandboxWarmPool**: Pre-warmed pool of sandboxes for instant claiming
- **SandboxTemplate**: Reusable sandbox configurations
- **Lifecycle management**: Create, delete, pause, resume, hibernate
- **Persistent storage**: State survives pod restarts

This is exactly what agent runtimes need for code execution environments.

---

## Proposal

The core idea is to enable PodCliqueSets to include sandbox-based workloads alongside inference workloads, with coordinated lifecycle management:

```
┌─────────────────────────────────────────────────────────────────────┐
│                         PodCliqueSet                                │
│                      "agent-runtime"                                │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                  PodCliqueScalingGroup                         │ │
│  │  ┌──────────────────┐      ┌─────────────────────────────────┐ │ │
│  │  │    PodClique     │      │         PodClique               │ │ │
│  │  │   "inference"    │      │    "execution-sandbox"          │ │ │
│  │  │                  │      │                                 │ │ │
│  │  │  ┌────────────┐  │      │  ┌───────────┐ ┌───────────┐   │ │ │
│  │  │  │ vLLM Pod   │  │ ───► │  │ Sandbox A │ │ Sandbox B │   │ │ │
│  │  │  │ (GPU)      │  │ tool │  │ (user 1)  │ │ (user 2)  │   │ │ │
│  │  │  └────────────┘  │ call │  └───────────┘ └───────────┘   │ │ │
│  │  │  ┌────────────┐  │      │  ┌───────────┐ ┌───────────┐   │ │ │
│  │  │  │ vLLM Pod   │  │      │  │ Sandbox C │ │ Sandbox D │   │ │ │
│  │  │  │ (GPU)      │  │      │  │ (user 3)  │ │ (user 4)  │   │ │ │
│  │  │  └────────────┘  │      │  └───────────┘ └───────────┘   │ │ │
│  │  └──────────────────┘      └─────────────────────────────────┘ │ │
│  │         ▲                              ▲                       │ │
│  │         │      Gang Scheduled          │                       │ │
│  │         └──────────────────────────────┘                       │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                    SandboxWarmPool                             │ │
│  │              "python-execution-pool"                           │ │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐  │ │
│  │  │ Warm    │ │ Warm    │ │ Warm    │ │ Warm    │ │ Warm    │  │ │
│  │  │ Sandbox │ │ Sandbox │ │ Sandbox │ │ Sandbox │ │ Sandbox │  │ │
│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘  │ │
│  └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```

**Key capabilities to explore:**

- **Sandbox-aware PodCliques**: PodCliques that manage Sandbox resources instead of raw Pods
- **Warm pool integration**: Claim pre-warmed sandboxes for instant scale-up
- **Hibernate/resume**: Pause sandboxes on scale-down instead of deleting, preserving state
- **Session affinity**: Route user sessions to dedicated sandboxes
- **Gang scheduling**: Co-locate inference and sandbox pods for low-latency tool calls

---

## Use Cases

### 1. Coding Assistants (Claude Code, Cursor, etc.)

Agents that execute user-provided or generated code need isolated environments:
- Run tests, install packages, execute scripts
- Persistent environment across conversation turns
- Isolated from other users and from inference infrastructure

### 2. Multi-Tenant Agent Platforms

Platforms hosting agents for multiple customers:
- Per-tenant or per-user sandbox isolation
- Shared inference infrastructure for cost efficiency
- Independent scaling per tenant

### 3. Research Agents with Stateful Analysis

Agents performing long-running analysis:
- Download datasets and install tools (persisted)
- Hibernate when idle to reduce costs
- Resume instantly when user returns

---

## Open Questions to Explore

### Architecture

1. How should sandbox support be exposed in the Grove API? Options include extending PodClique or introducing a new SandboxClique CRD.

2. How do Sandboxes participate in PodGang scheduling? The agent-sandbox controller would need to coordinate with Grove's scheduling gates.

3. Should Grove manage SandboxWarmPools directly, or reference externally-managed pools?

4. Where does session-to-sandbox routing live? (sidecar, separate service, or Grove operator)

### Implementation

5. How to handle clusters without agent-sandbox CRDs installed?

6. How to aggregate Sandbox status into PodClique/PodCliqueSet status?

### Security

7. What's the threat model for malicious code execution? Required isolation level (gVisor, Kata, etc.)?

8. How do sandboxes access secrets without exposing them to user code?

---

## Next Steps (H1 2026)

This is an exploratory proposal. In H1 2026, we should:

1. **Engage with agent-sandbox maintainers** to understand their roadmap and discuss integration points
2. **Gather requirements from agent developers** building on Kubernetes today
3. **Prototype basic integration** to validate the architecture
4. **Document security model** for running untrusted code alongside inference

Would love feedback and collaborators interested in this direction.

---

## References

- [kubernetes-sigs/agent-sandbox](https://github.com/kubernetes-sigs/agent-sandbox)

---

## Feedback Requested

I would appreciate feedback on:

1. Are we capturing the right agent runtime patterns?
2. Ideas for integrating Sandbox with PodGang scheduling?
3. Additional security considerations?
4. What else do agent developers need from infrastructure?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap Exploration]: Integrating Agent Sandboxes in Grove #359

Motivation

The Rise of AI Agents

The Infrastructure Gap

Why Grove?

Why agent-sandbox?

Proposal

Use Cases

1. Coding Assistants (Claude Code, Cursor, etc.)

2. Multi-Tenant Agent Platforms

3. Research Agents with Stateful Analysis

Open Questions to Explore

Architecture

Implementation

Security

Next Steps (H1 2026)

References

Feedback Requested

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Current Solution	Problems
Inference	vLLM/SGLang + manual deployment	No gang scheduling, cold starts
Code execution	Firecracker/gVisor/Kata	Separate orchestration
Session state	Redis/external DB	Another system to manage
Scaling	Custom HPA + glue code	Inference and execution scale independently
Isolation	Network policies + namespaces	Coarse-grained, complex setup

Grove Primitive	Agent Runtime Use Case
PodClique	Inference backend (GPU pods)
PodClique + Sandbox	Code execution environment (isolated)
PodCliqueScalingGroup	Inference + execution scaled together
Gang scheduling	Low-latency tool calls
Topology constraints	Inference and sandbox on same rack/node

[Roadmap Exploration]: Integrating Agent Sandboxes in Grove #359

Description

Motivation

The Rise of AI Agents

The Infrastructure Gap

Why Grove?

Why agent-sandbox?

Proposal

Use Cases

1. Coding Assistants (Claude Code, Cursor, etc.)

2. Multi-Tenant Agent Platforms

3. Research Agents with Stateful Analysis

Open Questions to Explore

Architecture

Implementation

Security

Next Steps (H1 2026)

References

Feedback Requested

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions