Author: Tomas Pflanzer @gizmax Date: 2026-02-20 (updated for v0.10.0)
- Autonomous AI agent market: $8.5B by 2026, projected $35-45B by 2030 (Deloitte)
- 40% of enterprise apps will have task-specific AI agents by end of 2026, up from <5% in 2025 (Gartner)
- 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025 (Gartner)
- MCP (Model Context Protocol): 97 million monthly SDK downloads by late 2025
| Framework | Strengths | Weaknesses vs Sandcastle |
|---|---|---|
| LangGraph (LangChain) | Graph-based workflows, checkpointing, LangSmith observability, massive ecosystem | Heavy abstraction, vendor lock-in, no native sandboxing, requires separate LangSmith subscription |
| CrewAI | Intuitive role-based agents, beginner-friendly | Sequential-first, no sandboxing, 2.2x slower than LangGraph |
| AutoGen (Microsoft) | Conversational multi-agent, AG-UI support | Awkward for structured DAG workflows |
| OpenAI Agents SDK | Minimalist, first-party OpenAI | OpenAI-only, no orchestration/scheduling/persistence |
| Google ADK | Deep Gemini integration, A2A protocol | Google ecosystem lock-in |
| Mastra | TypeScript-native, $13M YC seed, 150k weekly downloads, memory systems, MCP support | TypeScript-only ecosystem |
| AWS Strands | Model-first, Bedrock integration | AWS-centric |
| Platform | Notes |
|---|---|
| Dify | 130k+ AI apps, visual RAG pipeline builder, self-hostable |
| n8n | 230k+ active users, 400+ integrations, general automation |
| Flowise / Langflow | Open-source visual LLM workbenches, less production-grade |
| Platform | Key Feature |
|---|---|
| E2B (current) | Firecracker microVMs, ~150ms startup, open-source, 24h session limit |
| Daytona | Docker containers, sub-90ms provisioning |
| Blaxel (YC S25) | Perpetual sandboxes, 25ms resume |
| Sprites.dev (Fly.io) | Stateful sandboxes, checkpoint/rollback |
- Zero-Config to Production -
pip install sandcastle-ai && sandcastle init && sandcastle serve. No other framework offers this complete experience. - Native Sandboxed Execution - Built-in E2B integration. No competitor has sandboxed code execution by default.
- Single-Binary Architecture - API + Dashboard + Worker on one port. No Docker, no Redis required for local mode.
- YAML-First Workflows - Declarative, version-controllable, PR-reviewable. Google ADK is moving toward YAML too, validating this approach.
- Built-In Cost Tracking - Native cost tracking per run/step. Competitors require separate paid observability tools.
- Visual workflow builder, scheduling, webhooks, multi-tenant auth, PDF/CSV export
- Direct
e2bSDK integration viaAsyncSandbox-SandshoreRuntimeclass - Bundle
runner.mjs(Claude Agent SDK) +runner-openai.mjs(OpenAI-compatible) - Same
query()/query_stream()interface, zero breaking changes
- Built-in MCP server:
sandcastle mcpcommand - 8 tools (run_workflow, run_workflow_yaml, get_run_status, list_runs, cancel_run, save_workflow, create_schedule, delete_schedule)
- 3 resources (workflows, schedules, health)
- Compatible with Claude Desktop, Cursor, Windsurf
approval_required: trueflag on workflow steps- Step pause/resume API:
POST /api/runs/{id}/steps/{step_id}/approve - Dashboard UI for pending approvals (approve/reject/modify)
- Webhook notification when approval needed
- Per-step model selection:
model: sonnet,model: openai/codex-mini,model: minimax/m2.5,model: google/gemini-2.5-pro - Provider registry with pricing, runners, API keys
- Cost-based routing via CostLatencyOptimizer with EXTENDED_MODEL_POOL
SandboxBackendprotocol inbackends.py- E2B (default), Docker, Local (subprocess), Cloudflare Workers
- Config via
SANDBOX_BACKEND=e2b|docker|local|cloudflare
- Automatic failover on 429/5xx errors with per-key cooldown tracking
- Ordered failover chains (same-provider cheaper first, then cross-provider)
ProviderFailoversingleton with thread-safe cooldown management
- Local diagnostics: config, API keys, sandbox backends, dependencies, network
- Color-coded output: [PASS] green, [WARN] yellow, [FAIL] red
- No running server needed
- Sonner toast notifications across all pages
- Error states with retry buttons
- 404 catch-all route, shared SectionCard components
- Approvals badge in sidebar, search debounce
- Per-workflow persistent memory store (key-value + vector)
- Short-term (within run) + long-term (across runs) memory
- Cross-run context: "remember what you learned in previous runs"
- Optional Mem0 integration as pluggable backend
- Impact: Key differentiator over LangGraph, important for iterative workflows
sandcastle testCLI command with golden test cases- Assertion steps in YAML:
assert: output.sentiment in ['positive', 'neutral'] - OpenTelemetry export for Braintrust/Langfuse/Datadog integration
- Regression testing: compare current vs baseline output
- Impact: Production readiness signal for enterprise
- Conversational workflow builder: describe what you need, get a YAML workflow
- Integrated into dashboard and CLI (
sandcastle generate) - Suggests templates, models, and configurations based on use case
- Impact: Lowers barrier to entry, differentiator over YAML-only competitors
- Expose workflows as A2A-compatible agent endpoints
- Multi-system agent collaboration
- SSE endpoint compliance with AG-UI event format
- Interoperable with CopilotKit and other AG-UI frontends
type: agent(default) - Full E2B sandbox with toolstype: llm- Direct Messages API (cheaper, faster for pure text generation)type: http- Call REST APIstype: python- Run arbitrary Python in E2Btype: mcp- Call MCP toolstype: condition- if/else branchingtype: human- HITL approval
sandcastle replay <run_id>/sandcastle fork <run_id>- time-travel from CLIsandcastle approve <run_id> <step_id>/sandcastle reject- HITL from CLIsandcastle templates list/install- template management--jsonoutput mode for scripting and CI/CD integrationsandcastle run <workflow>- headless execution for pipelines
- Public template catalog: curated collection of production-ready workflow templates
sandcastle templates list- browse available templates with descriptions and tagssandcastle templates install <name>- install a template into the local workflows directory- YAML-based template format with metadata (author, version, description, required inputs, tags)
- Community contributions via GitHub PRs to a central template repository
- Semantic versioning for templates (install specific version or latest)
- Categories: data-processing, code-generation, research, content-creation, devops
- Checkpoint/resume for long-running workflows (Temporal-style durable execution)
- Workflow versioning (v1, v2, v3 with diff view)
- Template marketplace (share workflow templates publicly)
- Rate limiting per tenant
- Audit log
- ARIA accessibility for dashboard
- Per-run violations/optimizer views in RunDetailPage
Sandcastle Executor
|
v
SandshoreRuntime
|
+-- _build_env() -> resolve model, build env vars
+-- _stream_backend() -> failover wrapper
| +-- _stream_backend_once() -> execute via backend
| +-- On 429/5xx: mark_cooldown() + try alternatives
|
+-- SandboxBackend (Protocol)
+-- E2BBackend (default) - AsyncSandbox, background commands
+-- DockerBackend - aiodocker, tar upload
+-- LocalBackend - subprocess, no isolation
+-- CloudflareBackend - HTTP to CF Worker
Key design:
SandshoreRuntime.query(request)- Full resultSandshoreRuntime.query_stream(request)- Async generator of SSE eventsProviderFailover- per-key cooldown tracking, ordered fallback chainsrunner.mjs(Claude) +runner-openai.mjs(OpenAI/MiniMax/Gemini) bundled in package- Backend selection via
SANDBOX_BACKENDenv var - Health check with 60s TTL cache
- Proxy fallback to legacy Sandstorm server (backward compat)
- Keep
sandcastle-aisource-available (BSL 1.1, converts to Apache 2.0 after 4 years) - Build community, GitHub stars, adoption
| Tier | Price | Limits |
|---|---|---|
| Free | $0 | 100 runs/month, 1 workflow, local mode only |
| Pro | $29/mo | 5,000 runs/month, unlimited workflows, cloud execution, 30-day retention |
| Team | $99/mo | 50,000 runs/month, multi-tenant, RBAC, 90-day retention |
| Enterprise | Custom | Unlimited, SSO/SAML, SLA, dedicated infra, audit logs |
SSO/SAML, RBAC, audit logging, compliance export, private cloud/VPC, SLA
Workflow template marketplace (free + paid), community plugins, revenue share
AsyncSandboxfor async operations- Streaming via
on_stdoutcallback with background commands - File ops:
sandbox.files.write(),sandbox.files.read() - npm install via
background=True+ manual polling (avoids gRPC hang) - Python-side deadline on event loop (timeout + 30s grace)
- TypeScript:
@anthropic-ai/claude-agent-sdk(npm) - used in runner.mjs - Current approach (runner.mjs in E2B sandbox) remains optimal
runner-openai.mjs- supports any OpenAI-compatible API- Used by: MiniMax, OpenAI Codex, Google Gemini (via OpenRouter)
- Env vars:
MODEL_API_KEY,MODEL_ID,MODEL_BASE_URL,MODEL_INPUT_PRICE,MODEL_OUTPUT_PRICE
Runtime module: Sandshore (SandshoreRuntime)
- Fits "sand" theme - interface between Sandcastle and cloud execution
- MCP + A2A Standardization - Both donated to Linux Foundation under Agentic AI Foundation (OpenAI, Google, Microsoft, Anthropic all signed on)
- AG-UI Protocol - Born from CopilotKit + LangGraph + CrewAI. Standardizes agent-frontend communication.
- Multi-Agent Systems - Shift from single agents to orchestrated teams of specialized agents
- Human-on-the-Loop - Moving from "human approves every action" to "human sets guardrails and monitors"
- Agent Memory as Infrastructure - Mem0 achieving 26% improvement over baseline, AWS AgentCore Memory as managed service
- LLM Gateways - Portkey (1,600+ LLMs), LiteLLM (100+ LLMs), OpenRouter (500+ LLMs)
- Evaluation & Observability - Braintrust, Arize, Langfuse becoming standard infrastructure
This document synthesizes research from competitive landscape analysis, E2B SDK documentation, Claude Agent SDK investigation, and market trend reports from Deloitte, Gartner, and industry sources.