| title | sandbox inplace cpu resize | |
|---|---|---|
| authors |
|
|
| reviewers |
|
|
| creation-date | 2026-01-13 | |
| last-updated | 2026-01-13 | |
| status | implementable | |
| see-also | ||
| replaces | ||
| superseded-by |
- Title
This enhancement proposes enabling in-place CPU resizing for sandboxes
allocated from the warm pool through a metadata-based approach.
When a sandbox is claimed via the E2B API, users can specify a CPU scale factor in the metadata
(e.g., e2b.agents.kruise.io/cpu-scale-factor: 2).
The sandbox manager will automatically resize the allocated sandbox's CPU resources
in-place using Kubernetes' pod resize subResource, allowing the warm pool to maintain
minimal resource configurations while enabling on-demand CPU scaling for claimed sandboxes.
Key Benefits:
- Cost Optimization: Maintain warm pools with minimal CPU resources, scaling up only when sandboxes are actually claimed
- Zero Downtime: In-place CPU resizing without pod restart or recreation
Currently, the warm pool management strategy requires maintaining sandboxes with sufficient resources to handle peak workloads. This leads to:
- High Resource Costs: Warm pools must be provisioned with resources sufficient for the maximum expected workload, even though most sandboxes may not need peak resources immediately
- Inefficient Resource Utilization: Sandboxes sit idle in the warm pool consuming resources that may never be fully utilized
- Limited Flexibility: Once a sandbox is allocated, its resources cannot be adjusted without recreation, which causes downtime
- Enable Metadata-Based CPU Scaling: Allow users to specify CPU scale factor via E2B API metadata when creating sandboxes
- In-Place Resize: Leverage Kubernetes pod resize subResource to resize CPU without pod restart
- Early Return Support: Optionally return sandbox immediately once resize feasibility is confirmed
- Automatic Scaling: This does not implement automatic CPU scaling based on workload metrics
- Resize Policy Configuration: Users cannot configure resize policies
(always uses
NotRequiredrestart policy)
The existing E2B CreateSandbox API already accepts a metadata field.
This enhancement adds support for a new metadata key:
metadata:
e2b.agents.kruise.io/cpu-scale-factor: "2" # String representation of a positive numberMetadata Key: e2b.agents.kruise.io/cpu-scale-factor
- Type: String (must be parseable as a positive float64)
- Validation: Must be > 0, typically in range [1, 10] for practical use
- Default: If not specified, no resize is performed (backward compatible)
When a sandbox is claimed via CreateSandbox API:
- Metadata Parsing: Sandbox manager checks for
e2b.agents.kruise.io/cpu-scale-factorin the request metadata - CPU Calculation: If present, calculate target CPU as
originalCPU * scaleFactor - Validation: Validate that the target CPU is within acceptable bounds (respects pod limits, resource quotas, etc.)
- Resize Trigger: If validation passes, trigger pod resize via Kubernetes
/resizesubResource
Example Flow:
Original Sandbox CPU: 1 core
Metadata: e2b.agents.kruise.io/cpu-scale-factor: "2"
Target CPU: 1 * 2 = 2 cores
Action: Resize pod from 1 core to 2 cores
The resize logic is implemented in the sandbox manager's ClaimSandbox flow:
- After Sandbox Claim: Once a sandbox is successfully claimed from the pool
- Metadata Check: Check if
cpu-scale-factormetadata exists - Current CPU Detection: Read current CPU from pod spec or status
- Target Calculation: Calculate target CPU = current * scaleFactor
5Resize Execution: Call Kubernetes pod
/resizesubresource 6Status Monitoring: Monitor pod conditions for resize progress
Optional Feature: Once the system confirms that resize is feasible (PodResizingInProgress condition is set), the sandbox can be returned to the user immediately, even if the resize is still in progress.
Condition Check:
- Monitor for
PodResizingInProgresscondition in pod status - Once condition is
True, resize is confirmed feasible by kubelet - The condition indicates that:
- Kubelet has accepted the resize request
- Resource allocation has been updated
- Resize is being actuated (may still be in progress)
- Return sandbox to user with status indicating resize in progress
- User can start using sandbox while CPU resize completes asynchronously
User Request (CreateSandbox)
|
v
[Parse Metadata]
|
v
[cpu-scale-factor present?]
| No Yes
| | |
| v v
| [Return Sandbox] [Calculate Target CPU]
| |
| v
| [Validate Feasibility]
| |
| [Infeasible?]
| Yes / \ No
| | |
| v v
| [Return Error] [Call Pod /resize]
| |
| v
| [Monitor Conditions]
| |
| [PodResizingInProgress?]
| Yes / \ No
| | |
| v v
| [Early Return?] [Wait for Completion]
| Yes / \ No |
| | | |
| v v v
| [Return Sandbox] [Wait] [Return Sandbox]
| | | |
| +-------+--------------+
| |
| v
| [Resize Completes Async]
As a platform operator, I want to maintain warm pools with minimal CPU resources (0.5 cores) to reduce costs. When an agent claims a sandbox for a compute-intensive task, I want the sandbox to automatically scale to 4 cores in-place without downtime.
As an agent developer, I want to specify CPU requirements when claiming a sandbox based on my task's computational needs, so that I get appropriate resources without over-provisioning.
As an agent developer, I want to receive the sandbox immediately once the system confirms that CPU resize is feasible, even if the resize is still in progress, so that I can start using the sandbox without waiting for resize completion.
- 13/01/2026: Initial proposals draft created