| Aspect | AIOS (AGI Research) | Agent Control Plane |
|---|---|---|
| Primary Focus | Efficiency (throughput, latency) | Safety (policy enforcement, audit) |
| Target Audience | Researchers, ML Engineers | Enterprise, Production Systems |
| Kernel Philosophy | Resource optimization | Security boundary |
| Failure Mode | Graceful degradation | Kernel panic on violation |
| Policy Enforcement | Optional/configurable | Mandatory, kernel-level |
| Paper Venue | COLM 2025 | ASPLOS 2026 (target) |
┌─────────────────────────────────────┐
│ AIOS Kernel │
├─────────────────────────────────────┤
│ ┌─────────┐ ┌─────────────────┐ │
│ │Scheduler│ │ Context Manager │ │
│ └─────────┘ └─────────────────┘ │
│ ┌─────────┐ ┌─────────────────┐ │
│ │Memory │ │ Tool Manager │ │
│ │Manager │ │ │ │
│ └─────────┘ └─────────────────┘ │
│ ┌─────────────────────────────────┐│
│ │ Access Control (Optional) ││
│ └─────────────────────────────────┘│
└─────────────────────────────────────┘
Focus: GPU utilization, FIFO/Round-Robin scheduling, context switching
┌─────────────────────────────────────┐
│ Kernel Space (Ring 0) │
│ ┌─────────────────────────────────┐│
│ │ Policy Engine (Mandatory) ││
│ └─────────────────────────────────┘│
│ ┌─────────┐ ┌─────────────────┐ │
│ │ Flight │ │ Signal │ │
│ │Recorder │ │ Dispatcher │ │
│ └─────────┘ └─────────────────┘ │
│ ┌─────────┐ ┌─────────────────┐ │
│ │ VFS │ │ IPC Router │ │
│ │ Manager │ │ │ │
│ └─────────┘ └─────────────────┘ │
├─────────────────────────────────────┤
│ User Space (Ring 3) │
│ ┌─────────────────────────────────┐│
│ │ LLM Generation (Isolated) ││
│ │ Tool Execution ││
│ │ Agent Logic ││
│ └─────────────────────────────────┘│
└─────────────────────────────────────┘
Focus: Isolation, policy enforcement, audit trail, crash containment
| Feature | AIOS | Agent Control Plane |
|---|---|---|
| Scheduling | FIFO, Round-Robin, Priority | Policy-based, Safety-first |
| Context Switching | Performance optimized | Checkpoint + Rollback |
| Memory Model | Short-term + Long-term | VFS with mount points |
| Signal Handling | None | POSIX-style (SIGSTOP, SIGKILL, etc.) |
| Policy Violation | Log and continue | Kernel panic (0% tolerance) |
| Crash Isolation | Same process | Kernel survives user crashes |
| IPC | Function calls | Typed pipes with policy check |
| Audit | Logging | Flight recorder (black box) |
AIOS Approach:
"If an agent is slow, optimize it. If it fails, retry it."
Our Approach:
"If an agent violates policy, kill it immediately. No exceptions."
# AIOS: Efficiency-first
async def transfer_money(agent, amount):
# AIOS focuses on throughput
result = await agent.execute(f"Transfer ${amount}")
return result # Hope nothing went wrong
# Agent Control Plane: Safety-first
async def transfer_money(kernel, agent_ctx, amount):
# Policy check BEFORE execution
allowed = await agent_ctx.check_policy("transfer", f"amount={amount}")
if not allowed:
# Kernel panic - cannot proceed
raise PolicyViolation("Transfer exceeds limit")
# Execute with full audit trail
result = await agent_ctx.syscall(SyscallType.SYS_EXEC,
tool="transfer",
args={"amount": amount}
)
# Flight recorder has everything
return result| Concern | AIOS Answer | Our Answer |
|---|---|---|
| "What if agent goes rogue?" | "Monitor and intervene" | "Kernel panic, immediate termination" |
| "Can we audit all actions?" | "Logging available" | "Flight recorder - every syscall recorded" |
| "What about data exfiltration?" | "Access control optional" | "VFS mount points, policy per-path" |
| "Regulatory compliance?" | "Not primary focus" | "Built-in governance layer" |
| "Multi-tenant isolation?" | "Process-level" | "Kernel/User space separation" |
| Aspect | AIOS | Agent Control Plane |
|---|---|---|
| Novel Contribution | LLM Scheduling algorithms | Safety-first kernel design |
| ASPLOS Fit | Systems efficiency | OS abstractions for AI |
| eBPF Potential | Not explored | Network monitoring extension |
| Reproducibility | Benchmark suite | Differential auditing |
AIOS has no signal mechanism. Agents are black boxes.
Agent Control Plane implements POSIX-style signals:
class AgentSignal(IntEnum):
SIGSTOP = 1 # Pause for inspection (shadow mode)
SIGCONT = 2 # Resume execution
SIGINT = 3 # Graceful interrupt
SIGKILL = 4 # Immediate termination (non-maskable)
SIGTERM = 5 # Request graceful shutdown
SIGPOLICY = 8 # Policy violation (triggers SIGKILL)
SIGTRUST = 9 # Trust boundary crossed (triggers SIGKILL)Why this matters:
- SIGSTOP enables "shadow mode" - pause and inspect without termination
- SIGKILL is non-maskable - agents CANNOT ignore it
- SIGPOLICY is automatic on violation - 0% tolerance guarantee
Agent
├── Short-term Memory (conversation buffer)
└── Long-term Memory (persistent storage)
/
├── mem/
│ ├── working/ # Ephemeral scratchpad
│ ├── episodic/ # Experience logs
│ ├── semantic/ # Facts (vector store mount)
│ └── procedural/ # Learned skills
├── state/
│ └── checkpoints/ # Snapshots for rollback
├── tools/ # Tool interfaces
├── policy/ # Read-only policy files
└── ipc/ # Inter-process communication
Why VFS?
- Uniform interface: Same API for memory, state, tools
- Backend agnostic: Mount Pinecone, Redis, or file system
- Policy per-path:
/policyis read-only from user space - POSIX familiar: Engineers know this model
# AIOS - agents call each other directly
result = agent_b.process(agent_a.output)# Our approach - policy-enforced pipes
pipeline = (
research_agent
| PolicyCheckPipe(allowed_types=["ResearchResult"])
| summary_agent
)
result = await pipeline.execute(query)Why pipes?
- Type checking at pipe level (not runtime exceptions)
- Policy enforcement at every hop
- Backpressure prevents cascade failures
- Full audit trail through flight recorder
- Novel scheduling algorithms for LLMs
- Context switching performance
- Throughput benchmarks
- Novel OS abstractions for AI safety
- Kernel/User space separation for agent isolation
- POSIX-inspired primitives (signals, VFS, pipes)
- eBPF extension for network monitoring (future)
Key Differentiator: We are not competing on efficiency. We are defining the safety contract for enterprise AI agents.
┌─────────────────────────────────────────┐
│ Agent Process │
├─────────────────────────────────────────┤
│ HTTP Request to api.openai.com │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ eBPF Probe (Kernel Space) │ │
│ │ - Monitor all network calls │ │
│ │ - Block unauthorized endpoints │ │
│ │ - Log payload hashes │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Network Stack │
└─────────────────────────────────────────┘
Why eBPF?
- Monitoring happens OUTSIDE Python runtime
- Cannot be bypassed by agent code
- Sub-millisecond overhead
- ASPLOS loves eBPF papers
| Use Case | Recommended |
|---|---|
| Research experiments | AIOS |
| Production enterprise | Agent Control Plane |
| Throughput benchmarks | AIOS |
| Compliance-heavy industries | Agent Control Plane |
| Multi-agent chaos | AIOS (let them fight) |
| Multi-agent governance | Agent Control Plane |
AIOS and Agent Control Plane are not competing - they solve different problems.
- AIOS: "How do we run 1000 agents efficiently?"
- Agent Control Plane: "How do we run 10 agents without any of them going rogue?"
For enterprise adoption, the second question matters more.