Skip to content

[Roadmap Exploration]: Integrating Agent Sandboxes in Grove #359

@athreesh

Description

@athreesh

This issue explores integrating Grove with the Kubernetes agent-sandbox project to enable first-class support for AI agent runtimes.

Motivation

The Rise of AI Agents

AI agents—systems that combine LLM inference with tool use, code execution, and environmental interaction—are becoming a dominant deployment pattern. Examples include:

  • Coding assistants (Cursor, GitHub Copilot, Claude Code) that execute generated code
  • Research agents that browse the web, run analysis scripts, and synthesize findings
  • Autonomous agents (AutoGPT, OpenDevin) that iteratively solve tasks
  • Enterprise agents that interact with APIs, databases, and internal systems

These agents share a common architecture:

User Request → LLM Inference → Tool/Code Execution → Result → LLM Inference → ...

The Infrastructure Gap

Today, deploying agent runtimes on Kubernetes requires stitching together:

Component Current Solution Problems
Inference vLLM/SGLang + manual deployment No gang scheduling, cold starts
Code execution Firecracker/gVisor/Kata Separate orchestration
Session state Redis/external DB Another system to manage
Scaling Custom HPA + glue code Inference and execution scale independently
Isolation Network policies + namespaces Coarse-grained, complex setup

Agent developers want to focus on agent logic, not infrastructure plumbing.

Why Grove?

Grove already solves hard problems for distributed inference:

  • Gang scheduling: Ensures all components of a workload start together
  • Topology awareness: Co-locates tightly-coupled components for low latency
  • Hierarchical scaling: PodCliqueScalingGroups scale related components as a unit
  • Dependency ordering: startsAfter ensures correct startup sequences
  • MinAvailable guarantees: Workloads maintain minimum capacity or terminate gracefully

These primitives map directly to agent runtime requirements:

Grove Primitive Agent Runtime Use Case
PodClique Inference backend (GPU pods)
PodClique + Sandbox Code execution environment (isolated)
PodCliqueScalingGroup Inference + execution scaled together
Gang scheduling Low-latency tool calls
Topology constraints Inference and sandbox on same rack/node

Why agent-sandbox?

The agent-sandbox project (kubernetes-sigs) provides:

  • Sandbox CRD: Isolated, stateful, singleton workloads
  • SandboxWarmPool: Pre-warmed pool of sandboxes for instant claiming
  • SandboxTemplate: Reusable sandbox configurations
  • Lifecycle management: Create, delete, pause, resume, hibernate
  • Persistent storage: State survives pod restarts

This is exactly what agent runtimes need for code execution environments.


Proposal

The core idea is to enable PodCliqueSets to include sandbox-based workloads alongside inference workloads, with coordinated lifecycle management:

┌─────────────────────────────────────────────────────────────────────┐
│                         PodCliqueSet                                │
│                      "agent-runtime"                                │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                  PodCliqueScalingGroup                         │ │
│  │  ┌──────────────────┐      ┌─────────────────────────────────┐ │ │
│  │  │    PodClique     │      │         PodClique               │ │ │
│  │  │   "inference"    │      │    "execution-sandbox"          │ │ │
│  │  │                  │      │                                 │ │ │
│  │  │  ┌────────────┐  │      │  ┌───────────┐ ┌───────────┐   │ │ │
│  │  │  │ vLLM Pod   │  │ ───► │  │ Sandbox A │ │ Sandbox B │   │ │ │
│  │  │  │ (GPU)      │  │ tool │  │ (user 1)  │ │ (user 2)  │   │ │ │
│  │  │  └────────────┘  │ call │  └───────────┘ └───────────┘   │ │ │
│  │  │  ┌────────────┐  │      │  ┌───────────┐ ┌───────────┐   │ │ │
│  │  │  │ vLLM Pod   │  │      │  │ Sandbox C │ │ Sandbox D │   │ │ │
│  │  │  │ (GPU)      │  │      │  │ (user 3)  │ │ (user 4)  │   │ │ │
│  │  │  └────────────┘  │      │  └───────────┘ └───────────┘   │ │ │
│  │  └──────────────────┘      └─────────────────────────────────┘ │ │
│  │         ▲                              ▲                       │ │
│  │         │      Gang Scheduled          │                       │ │
│  │         └──────────────────────────────┘                       │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                    SandboxWarmPool                             │ │
│  │              "python-execution-pool"                           │ │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐  │ │
│  │  │ Warm    │ │ Warm    │ │ Warm    │ │ Warm    │ │ Warm    │  │ │
│  │  │ Sandbox │ │ Sandbox │ │ Sandbox │ │ Sandbox │ │ Sandbox │  │ │
│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘  │ │
│  └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘

Key capabilities to explore:

  • Sandbox-aware PodCliques: PodCliques that manage Sandbox resources instead of raw Pods
  • Warm pool integration: Claim pre-warmed sandboxes for instant scale-up
  • Hibernate/resume: Pause sandboxes on scale-down instead of deleting, preserving state
  • Session affinity: Route user sessions to dedicated sandboxes
  • Gang scheduling: Co-locate inference and sandbox pods for low-latency tool calls

Use Cases

1. Coding Assistants (Claude Code, Cursor, etc.)

Agents that execute user-provided or generated code need isolated environments:

  • Run tests, install packages, execute scripts
  • Persistent environment across conversation turns
  • Isolated from other users and from inference infrastructure

2. Multi-Tenant Agent Platforms

Platforms hosting agents for multiple customers:

  • Per-tenant or per-user sandbox isolation
  • Shared inference infrastructure for cost efficiency
  • Independent scaling per tenant

3. Research Agents with Stateful Analysis

Agents performing long-running analysis:

  • Download datasets and install tools (persisted)
  • Hibernate when idle to reduce costs
  • Resume instantly when user returns

Open Questions to Explore

Architecture

  1. How should sandbox support be exposed in the Grove API? Options include extending PodClique or introducing a new SandboxClique CRD.

  2. How do Sandboxes participate in PodGang scheduling? The agent-sandbox controller would need to coordinate with Grove's scheduling gates.

  3. Should Grove manage SandboxWarmPools directly, or reference externally-managed pools?

  4. Where does session-to-sandbox routing live? (sidecar, separate service, or Grove operator)

Implementation

  1. How to handle clusters without agent-sandbox CRDs installed?

  2. How to aggregate Sandbox status into PodClique/PodCliqueSet status?

Security

  1. What's the threat model for malicious code execution? Required isolation level (gVisor, Kata, etc.)?

  2. How do sandboxes access secrets without exposing them to user code?


Next Steps (H1 2026)

This is an exploratory proposal. In H1 2026, we should:

  1. Engage with agent-sandbox maintainers to understand their roadmap and discuss integration points
  2. Gather requirements from agent developers building on Kubernetes today
  3. Prototype basic integration to validate the architecture
  4. Document security model for running untrusted code alongside inference

Would love feedback and collaborators interested in this direction.


References


Feedback Requested

I would appreciate feedback on:

  1. Are we capturing the right agent runtime patterns?
  2. Ideas for integrating Sandbox with PodGang scheduling?
  3. Additional security considerations?
  4. What else do agent developers need from infrastructure?

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/discussionDiscussion (engaging others in deciding about multiple options)kind/enhancementCategorizes issue or PR as related to a new feature, enhancement or improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions