Each agent session has a mounted SQLite DB. The DB is the one and only IO mechanism between host and container. No IPC files, no stdin piping. Two tables: messages_in (host → agent-runner) and messages_out (agent-runner → host). Everything is a message.
Central DB (host process):
- Agent groups, conversations, routing tables
- Maps platform IDs → agent groups → sessions
- Channel adapters don't touch this directly — the host does the lookup
Per-session DB (mounted into container):
- messages_in (written by host, read by agent-runner)
- messages_out (written by agent-runner, read by host)
- Everything is a message: chat, tasks, webhooks, system actions, agent-to-agent — all use these two tables
- One DB per session, not per agent group
An agent group has its own filesystem — folder, CLAUDE.md, skills, container config. Multiple sessions can share the same agent group (same filesystem, same skills) but each session gets its own DB mounted at a known path. Each session = a separate container with the same agent group's filesystem but a different session DB.
Platform event
→ Channel adapter (trigger check, ID extraction)
→ Returns: { platformChannelId, platformThreadId, triggered }
→ Host maps platformChannelId + platformThreadId → agent group + session
→ Host writes message to session's DB
→ Host calls wakeUpAgent(session)
→ Container spins up (or is already running)
→ Agent-runner polls its session DB, finds new messages
→ Agent-runner processes with Claude
→ Agent-runner writes response to session DB
→ Host polls active session DBs for responses
→ Host reads response, looks up conversation, delivers through channel adapter
Channel adapters are responsible for:
- Receiving platform events (webhooks, polling, websockets — platform-specific)
- Filtering: deciding which messages to forward to the host for processing. This can be stateless (regex trigger match) or stateful (e.g., "was the bot mentioned in this thread at some point? If so, forward all subsequent messages"). The adapter receives a stream of unfiltered platform messages and decides which ones to pass on. How it decides is an implementation detail — NanoClaw doesn't know or care.
- Extracting and standardizing two IDs:
- Platform channel ID — identifies the conversation (WhatsApp group, Slack channel, email thread)
- Platform thread ID — optional sub-context (Slack thread, GitHub PR comment thread)
- Outbound delivery — sending responses back to the platform
The channel adapter does NOT know about agent group IDs or session IDs. It returns platform-level identifiers. The host maps those to the entity model.
The two-level ID scheme (channel ID + thread ID) gives flexibility:
- Want every Slack thread to be a separate session? Return unique thread IDs.
- Want all messages in a Slack channel to share a session? Return the same thread ID (or null).
- This is configured per-channel, not globally.
Adapters are stateless — they receive config from the host at setup time, not from the DB directly.
What lives in code (per channel type, doesn't change at runtime):
- Auto-registration behavior (enabled/disabled, how it works)
- Sender allowlist rules
- Whether allowlisted senders can auto-register groups
- Platform-specific connection and message handling
These are decisions made when setting up the channel adapter. Change them = change the code.
What lives in the DB (per group, varies group to group):
- Which agent group handles it
- Trigger / filter rules (regex, @mention-only, exclude certain senders, etc.)
- Response scope (respond to all messages vs only triggered/allowlisted)
- Session mode (shared vs per-thread)
The host reads per-group config from the DB and passes it to the adapter at setup. If config changes at runtime (admin agent registers a new group, changes a trigger), the host calls the adapter's update method.
When the adapter forwards a message from an unknown group, the host needs to decide whether to create the group and a session for it.
The adapter controls whether to forward unknown messages — based on its code-level auto-registration rules (sender allowlist, group-add detection, etc.). If the adapter forwards it, the host creates the group + session.
Session creation for known groups:
- Shared session mode: host finds the existing session or creates one if it's the first message
- Per-thread session mode: host looks up by threadId. If no session exists for this thread, auto-creates one with the same agent group
The code-level rules are channel-specific:
- WhatsApp: if an allowlisted number adds the bot to a group → auto-register. If an unknown number DMs → depends on the adapter's configuration.
- Email: if the sender is known → auto-register the thread. If unknown → drop.
- Slack: if someone @mentions the bot in a new channel → adapter decides whether to forward based on its rules.
No channel_configs table — channel-type-level behavior is baked into the adapter code.
Chat SDK adapters are wrapped per-channel:
- Each Chat SDK adapter gets its own Chat instance
- Concurrency mode is configured per-channel (concurrent for chat, queue for tasks, debounce for webhooks)
- A bridge wraps the Chat instance + adapter to conform to NanoClaw's standard channel interface
- Chat SDK handles: webhook parsing, dedup, message history, platform API calls, rich content delivery
- NanoClaw handles: routing, agent lifecycle, session management
Chat SDK's subscription model:
Chat SDK has its own thread-level subscription concept (distinct from NanoClaw's channel-level registration):
onNewMention/onNewMessage(regex)— fires on first contact (e.g., @mention in a Slack thread)thread.subscribe()— opts into all future messages in that threadonSubscribedMessage— fires for all messages in subscribed threads
This is sub-channel granularity. NanoClaw registers at the channel level ("listen to this Discord channel"). Chat SDK subscribes at the thread level ("track this specific Slack thread"). The bridge lets Chat SDK manage its own subscriptions internally — NanoClaw doesn't interfere with or replicate this.
Platform capability differences:
Capabilities vary significantly across adapters (see Chat SDK adapter docs):
- Slack: Full rich content (Block Kit cards, modals, streaming, reactions, ephemeral messages)
- Discord: Embeds, buttons, streaming via post+edit
- WhatsApp (Cloud API): DMs only, interactive reply buttons, no streaming, no reactions
- GitHub/Linear: Markdown comments, no interactive elements
- Telegram: Inline keyboard buttons, streaming via post+edit
The host/bridge handles graceful degradation — if an agent posts a card on a platform that doesn't support cards, it falls back to text.
Non-Chat-SDK channels (WhatsApp via Baileys, Gmail, custom integrations) implement the NanoClaw channel interface directly — no bridge, no Chat SDK types.
The host is an orchestrator:
- Spawn — when wakeUpAgent is called and no container exists for the session
- Idle kill — when a container has no unprocessed messages for some timeout period
- Limits — MAX_CONCURRENT_CONTAINERS caps active containers
When a container spins up, the agent-runner immediately starts polling its session DB. Messages are already there waiting.
Media is not downloaded by the host. Instead:
- Messages include download URLs (signed URLs where possible)
- Agent-runner downloads and processes media inside the container
- For channels where signed URLs don't work (e.g., WhatsApp with buffered streams), the channel adapter downloads the media and serves it via a local URL/server that the container can access
Native content blocks (provider-dependent):
The agent-runner detects file types and passes supported types as native content blocks where the provider supports it:
| Type | Claude | Codex | OpenCode |
|---|---|---|---|
| Images (JPEG, PNG, GIF, WebP) | Native image content block | Save to disk, reference in prompt | Save to disk, reference in prompt |
| PDFs | Native document content block | Save to disk | Save to disk |
| Audio | Native audio content block | Save to disk | Save to disk |
| Other files (code, data, video, archives) | Save to disk | Save to disk | Save to disk |
"Save to disk" means downloaded to /workspace/downloads/{messageId}/ and referenced in the prompt text as an available file path. The agent can use tools (Read, Bash) to access it.
The agent-runner builds the prompt differently per provider. For Claude, it constructs multi-part MessageParam content with image/document blocks. For Codex/OpenCode, everything is text with file path references.
Outbound file delivery is tool-based. The agent calls a tool (e.g., send_file) with a file path. The agent-runner moves the file to the outbox and writes the messages_out row.
/workspace/
outbox/
{message_id}/ ← one dir per messages_out row
chart.png
report.pdf
messages_out content references filenames only:
{ "text": "Here's the chart", "files": ["chart.png", "report.pdf"] }No paths in the DB — the convention is the contract. The host reads files from outbox/{message_id}/ in the mounted session folder and delivers them via the adapter (Chat SDK FileUpload with buffer data, or platform-specific upload for native channels). Host cleans up the outbox directory after successful delivery.
Outbound files use a dedicated send_file MCP tool (separate from send_message). See agent-runner-details.md for the tool interface.
Dedup is the channel adapter's responsibility. Chat SDK handles this internally. Native adapters track platform message IDs as needed. The host does not deduplicate — if the adapter forwards it, the host writes it.
Two tables. JSON blobs for content — schema-free, format varies by kind.
-- Host writes, agent-runner reads
CREATE TABLE messages_in (
id TEXT PRIMARY KEY,
kind TEXT NOT NULL, -- 'chat' | 'chat-sdk' | 'task' | 'webhook' | 'system'
timestamp TEXT NOT NULL,
status TEXT DEFAULT 'pending', -- 'pending' | 'processing' | 'completed' | 'failed'
status_changed TEXT, -- ISO timestamp of last status change
process_after TEXT, -- ISO timestamp. NULL = process immediately.
recurrence TEXT, -- cron expression. NULL = one-shot.
tries INTEGER DEFAULT 0, -- number of processing attempts
-- routing (agent-runner copies to messages_out; agent never sees these)
platform_id TEXT,
channel_type TEXT,
thread_id TEXT,
-- payload (structure depends on kind)
content TEXT NOT NULL -- JSON blob
);
-- Agent-runner writes, host reads
CREATE TABLE messages_out (
id TEXT PRIMARY KEY,
in_reply_to TEXT, -- references messages_in.id (optional)
timestamp TEXT NOT NULL,
delivered INTEGER DEFAULT 0,
deliver_after TEXT, -- ISO timestamp. NULL = deliver immediately.
recurrence TEXT, -- cron expression. NULL = one-shot.
-- routing (default: copied from messages_in by agent-runner)
kind TEXT NOT NULL, -- 'chat' | 'chat-sdk' | 'task' | 'webhook' | 'system'
platform_id TEXT,
channel_type TEXT,
thread_id TEXT,
-- payload (format matches kind)
content TEXT NOT NULL -- JSON blob
);
One-shot and recurring tasks use the same tables — no separate scheduler.
One-shot: process_after (inbound) or deliver_after (outbound) with recurrence = NULL.
Recurring: Same, plus a recurrence cron expression. After the host marks a row as handled/delivered, if recurrence is set, it inserts a new row with process_after/deliver_after advanced to the next cron occurrence. Next time is computed from the scheduled time (not wall clock) to prevent drift.
Host sweep (every ~60s across all session DBs):
messages_in WHERE status = 'pending' AND (process_after IS NULL OR process_after <= now())→ wake agentmessages_in WHERE status = 'processing' AND status_changed < (now - stale_threshold)→ stale detection, increment tries, reset to pending with backoffmessages_out WHERE delivered = 0 AND (deliver_after IS NULL OR deliver_after <= now())→ deliver- After completing/delivering a row with
recurrence, insert next occurrence
Active container poll (~1s) checks the same conditions but only for sessions with running containers.
Agent-runner creates schedules by writing messages_in (to itself) or messages_out (reminders/notifications) with process_after and optionally recurrence.
chat — simple NanoClaw format. Any channel can produce this.
{
"sender": "John",
"senderId": "user123",
"text": "Check this PR",
"attachments": [{ "type": "image", "url": "https://signed-url..." }],
"isFromMe": false
}chat-sdk — full Chat SDK SerializedMessage, passed through from bridge adapter. Includes author, text, formatted (mdast AST), attachments, isMention, links, metadata.
task — scheduled task firing.
{ "prompt": "Review open PRs", "script": "scripts/review.sh" }webhook — raw webhook payload.
{ "source": "github", "event": "pull_request", "payload": { ... } }system — host action result (response to a system action the agent requested).
{ "action": "register_group", "status": "success", "result": { "agent_group_id": "ag-456" } }Output kind determines the format and delivery adapter. Default: agent-runner copies kind and routing fields from the messages_in row it's responding to.
chat — simple NanoClaw format. NanoClaw channel delivers via sendMessage(text).
{ "text": "LGTM, merging now" }chat-sdk — Chat SDK AdapterPostableMessage. Bridge adapter delivers via thread.post(). Can be markdown, card, or raw — adapter handles platform conversion.
{ "markdown": "## Review\n**LGTM**", "attachments": [...] }{ "card": { "type": "card", "title": "Review", "children": [...] }, "fallbackText": "..." }task — task result. Host logs and optionally notifies.
{ "result": "3 PRs reviewed", "status": "success" }webhook — webhook response. Host sends HTTP response or notifies.
{ "response": { "status": 200, "body": { ... } } }system — host action request (register group, reset session, etc.). Host reads, validates permissions, executes, writes result back as a system messages_in row.
{ "action": "reset_session", "payload": { "session_id": "sess-123" } }All interactive operations flow through messages_in/out — the DB is the only IO boundary for the container. The agent uses MCP tools; the agent-runner translates tool calls into structured messages_out rows; the host delivers through the appropriate adapter method.
Cards with user interaction (e.g., "Ask User Question"):
- Agent calls
ask_user_questiontool with question + options - Agent-runner writes messages_out with the question card
- Host delivers as interactive card through adapter (e.g., Slack Block Kit buttons)
- User clicks an option
- Platform sends event back to adapter → host writes messages_in with the response
- Agent-runner reads messages_in, matches to pending tool call, returns selection to agent as tool result
The agent-runner holds the tool call open while waiting for the user's response in messages_in. The round-trip goes: agent → messages_out → host → platform → user clicks → platform → host → messages_in → agent-runner → agent.
Approvals:
Two patterns, both handled at the host level:
- Implicit: Agent calls a tool that requires approval. Host intercepts, sends approval card to admin, waits for response, then executes or rejects. The agent doesn't know about the approval step.
- Explicit: Agent explicitly requests approval via a tool. Agent-runner writes the approval request to messages_out. Same flow as "ask user question" — response comes back through messages_in.
In both cases, the approval and action execution happen on the host side, not the agent side.
Approval routing: Privilege is a user-level concept. user_roles records owner (global only — first user to pair becomes owner) and admin (global or scoped to a specific agent_group_id). When an action requires approval, pickApprover(agentGroupId) returns candidates in order: scoped admins for that agent group → global admins → owners (deduplicated). pickApprovalDelivery then takes the first candidate reachable via ensureUserDm (with a same-channel-kind tie-break so a Discord approval request prefers a Discord-using approver). The approval card lands in the approver's DM messaging group, not the origin chat. Delivery is resolved through the Chat SDK's openDM for resolution-required channels (Discord/Slack/…) or the user's handle directly for direct-addressable channels (Telegram/WhatsApp/…), and the mapping is cached in user_dms for subsequent requests. See src/access.ts, src/user-dm.ts.
Editing a sent message:
Agent calls an edit_message tool with the message ID and new content. Agent-runner writes messages_out with an edit operation. Host calls adapter.editMessage(). Messages in the agent's context include integer IDs so the agent can reference them.
Reactions:
Agent calls add_reaction tool with message ID and emoji. Agent-runner writes messages_out with a reaction operation. Host calls adapter.addReaction().
Operations in messages_out content:
// Normal message (default)
{ "text": "LGTM" }
// Interactive card
{ "operation": "ask_question", "title": "Deploy", "question": "Approve deployment?", "options": ["Yes", "No", "Defer"] }
// Edit existing message
{ "operation": "edit", "messageId": "3", "text": "Updated: LGTM with minor comments" }
// Reaction
{ "operation": "reaction", "messageId": "5", "emoji": "thumbs_up" }The host reads the operation field (if present) and calls the right adapter method. No operation field = normal message delivery. Platform capabilities vary — the host/bridge handles graceful degradation (e.g., reaction on a platform that doesn't support it → skip or send as text).
Sending a message to another agent uses the same routing fields as channel delivery. The agent-runner sets channel_type: 'agent' and platform_id to the target agent group ID. Optionally, thread_id can target a specific session (null = find or create the default session).
From the sending agent's perspective, it's the same mechanism as sending to Slack or WhatsApp — just a messages_out row with different routing. The host reads it, checks that this agent group has permission to message the target, resolves the target session, and writes a messages_in row to that session's DB.
// messages_out routing fields
{ "kind": "chat", "channel_type": "agent", "platform_id": "pr-worker", "thread_id": null }
// messages_out content
{ "text": "Reset your session and re-review", "sender": "Supervisor", "senderId": "agent:pr-admin" }The receiving agent gets a normal chat message. It doesn't need to know the source is another agent unless that's relevant context.
Default behavior: Agent-runner copies routing fields (kind, platform_id, channel_type, thread_id) from the messages_in row to messages_out. Response goes back where it came from.
Host validation: Before delivering, the host checks that this agent group is permitted to send to the destination. The agent-runner copies routing; the host validates.
Multi-destination pattern (customization): An agent may need to send to a different channel than the origin (e.g., a webhook triggers a Slack notification). This is supported via custom code, not built into the core:
- Add a
destinationstable to the session DB mapping logical names to routing fields - Populate it from the host when setting up the session
- Modify the agent's prompt to list available destinations
- Agent chooses a destination by name; agent-runner resolves to routing fields
- Host validates as usual
This is documented as a pattern, not a built-in feature.
- Container isolation via filesystem mounts
- Credential proxy (OneCLI)
- Per-agent-group workspace (folder, CLAUDE.md, skills)
- Polling-based (not event-driven)
- Per-agent-group agent-runner recompilation on container startup (agent can modify its own source, request rebuild/restart, changes persist across teardowns)
- Host ↔ container IO through mounted session DBs (
messages_in/messages_out) — no stdin piping, no IPC files - Agent commands are
messages_outrows withkind: 'system' - Agent-to-agent supported via target-agent routing on
messages_out - Scheduling uses
process_after/deliver_after+recurrenceon the same message tables - Media via signed URLs, downloaded in the container
- Channel adapters use the Chat SDK bridge + a standard interface (trunk ships only the bridge/registry; platform adapters install via
/add-<channel>skills) - Routing: channel adapter extracts IDs, host maps to entities
- Concurrency: Chat SDK per-channel + container limits
- Session scoping: per-session DB, multiple sessions per agent group
Session DB location: Not in the agent group folder. Separate directory (e.g., sessions/{session_id}/). Each session gets its own folder containing session.db and the Claude SDK's .claude/ directory. The session identity IS the folder — no need to track Claude SDK session IDs.
Container mount structure:
/workspace/ ← mount: session folder (read-write)
.claude/ ← Claude SDK session data (auto-created)
session.db ← session SQLite DB
outbox/ ← agent-runner writes outbound files here
agent/ ← mount: agent group folder (nested, read-write)
CLAUDE.md ← agent instructions
skills/ ← agent skills
... working files
Two directory mounts: session folder at /workspace, agent group folder at /workspace/agent/. The agent-runner CDs into /workspace/agent/ to run the agent. Claude SDK writes .claude/ at /workspace/.claude/ (root of the workspace). The session DB is at /workspace/session.db.
This works on both Docker (nested bind mounts) and Apple Container (directory mounts only — no file-level mounts, but nested directory mounts are supported).
Session DB concurrent access: The host writes messages_in, the agent-runner writes messages_out. Both access the same SQLite file simultaneously. WAL mode handles this — SQLite allows concurrent readers, and the two sides write to different tables so writer contention is minimal. The host enables WAL mode when creating the session DB.
Session management: Host-managed. The host creates session folders and mounts them. The container only sees its own session folder.
Session creation (no race condition):
- Message arrives, host checks central DB for a session matching this group + thread
- No session exists → host atomically creates session row in central DB, creates the session folder, creates the session DB, writes the message
- More messages arrive before container starts → host finds the existing session, writes to the same session DB
- Container starts, mounts the folder, agent-runner finds messages waiting
The central DB session row creation is the serialization point. No Claude SDK session ID to coordinate — the SDK discovers its own session data in .claude/ when the agent runs.
System actions: The agent uses MCP tools (register group, reset session, schedule task, etc.). The agent-runner handles these tool calls and writes a structured, deterministic messages_out row with kind: 'system'. This is not natural language — it's a programmatic, structured payload that the host processes deterministically. Host validates permissions, executes, and writes the result back as a system messages_in row.
Container lifecycle: No warm pool. Containers are spawned on demand (wakeUpAgent) and torn down from the outside by the host when idle. Existing idle detection + teardown mechanism carries over.
NanoClaw does not stream tokens to users. The Claude Agent SDK's query() yields complete results. The agent-runner writes one complete message to messages_out per result. The host delivers complete messages to channels.
Message editing is supported as an explicit operation (agent calls an edit_message tool), not as a streaming mechanism.
Typing indicators: host sets typing when a container is active for a session, clears when the container exits or a response appears in messages_out.
When multiple messages arrive while the container is down, they accumulate as handled = 0 rows in messages_in. When the container wakes up, the agent-runner queries all unhandled messages and processes them as a batch — multiple messages are formatted into a single <messages> XML block.
pending → processing → completed
→ failed (after max retries)
- pending: Written by host. Ready to be picked up (if
process_afteris null or past). - processing: Agent-runner sets this when it picks up the message.
status_changedis set to now. Prevents other polls from re-picking the same message. - completed: Agent-runner sets this after successful processing.
- failed: Set after max retries exhausted.
Stale detection: If a message is processing but status_changed is too old (e.g., >10 minutes), the host assumes the container crashed. It resets the message to pending, increments tries, and sets process_after with exponential backoff.
Retries use process_after with exponential backoff. Each retry increments tries and pushes process_after further out:
- Try 1: immediate
- Try 2: +5s
- Try 3: +10s
- Try 4: +20s
- Try 5: +40s
- After max retries: status set to
failed
The host computes this — not the agent-runner. When the host detects a stale processing message or the container exits with an error, it increments tries, computes the next process_after, and resets status to pending.
Output-sent protection: If messages_out already has delivered rows for a batch, don't retry (prevents duplicate messages to user).
Two tiers:
- Active containers (~1s): Poll session DBs for new messages_out rows to deliver
- All sessions (~60s): Sweep all session DBs for due
process_after/deliver_aftertimestamps, handle recurrence
The architecture is flexible for code changes, not configurable for everything. Advanced setups (like the PR Factory below) use custom routing logic and host-side hooks — not database config columns.
NanoClaw is customized via skills — branches that get merged into the user's installation. Different skills add different capabilities (channels, integrations, behaviors). The code must be structured so that:
-
Different customizations don't conflict. Adding Slack and adding Telegram should not produce merge conflicts. Adding a new MCP tool should not conflict with adding a channel. Each type of customization should touch its own file(s).
-
Core blocks of functionality are in separate files. Channel registration, message formatting, MCP tools, routing logic, container management — each in its own file. A skill that changes how messages are formatted doesn't touch the file that handles container spawning.
-
The index file is thin. It wires things together (init DB, start adapters, start poll loops) but contains no business logic. All logic lives in purpose-specific modules that skills can modify independently.
-
Don't over-split. A simple change (e.g., adding a new message kind) shouldn't require edits across 5 files. Group related logic together. The goal is that each skill touches 1-2 files for its core change.
-
Registration patterns over switch statements. Channels, MCP tools, and providers should use registration/plugin patterns. A skill adds a channel by adding a file and a registration call — not by editing a central switch statement alongside every other channel.
Practical example: Adding a new channel via skill should require:
- One new file (the channel adapter or Chat SDK config)
- One line in the barrel file (
channels/index.ts) to import the self-registering module - Zero changes to routing, formatting, delivery, or container code
Analysis of 33 skill branches shows these files cause the most merge conflicts:
| Hotspot | Why it conflicts | Solution |
|---|---|---|
src/index.ts (2000 LOC) |
Every skill patches the main loop, imports, init logic | Thin index that wires modules. Logic lives in purpose-specific files (router, delivery, session-manager, host-sweep). |
src/config.ts |
Every skill adds env vars to a central file | Config declared where it's used. Each module reads its own env vars. No central config registry that every skill edits. |
src/container-runner.ts |
Channel skills add mounts, env vars, credential setup | Declarative mount registration. Channels declare their mounts in their own file. Container runner reads from a registry, not a hardcoded list. |
src/db.ts (750 LOC) |
Schema, migrations, and all CRUD in one file | Split by entity. Numbered migrations. Skills add a migration file + edit one entity file. |
container/agent-runner/src/index.ts |
Agent protocol, IPC handling, formatting all in one file | Split into poll-loop, formatter, providers/, mcp-tools/. Session DB replaces IPC. |
src/ipc.ts |
Every MCP tool addition patches one file | mcp-tools/ directory with barrel. Skills add a tool file + barrel line. |
src/channels/index.ts |
Every channel adds an import line at the same location | Barrel file with comment slots per channel (current pattern works, keep it). |
Mount registration pattern: Instead of every channel skill editing buildVolumeMounts(), channels declare mounts that the container runner collects:
// channels/gmail.ts
registerChannel('gmail', {
factory: createGmailAdapter,
mounts: [
{ hostPath: '~/.gmail-mcp', containerPath: '/home/node/.gmail-mcp', readonly: false }
],
env: ['GMAIL_OAUTH_TOKEN'],
});The container runner reads registered mounts from the channel registry — no need to edit container-runner.ts.
Config pattern: Skills don't patch config.ts or .env.example. Skill-specific env vars are documented in the skill's SKILL.md — the setup process reads those instructions. Each module reads its own env vars directly:
// channels/discord.ts
const DISCORD_TOKEN = process.env.DISCORD_BOT_TOKEN;
// channels/gmail.ts
const GMAIL_CREDS = process.env.GMAIL_CREDENTIALS_PATH;Shared config (DATA_DIR, TIMEZONE, MAX_CONCURRENT_CONTAINERS) stays in config.ts. Channel/skill-specific config stays in the module that uses it.
Line width: 120 characters. Most statements fit on one line without sacrificing readability.
Concise logging. A thin wrapper keeps every log call on one line:
log.info('IPC message sent', { chatJid, sourceGroup });
log.warn('Unauthorized IPC attempt', { chatJid });
log.error('Error processing', { file, err });The DB layer is split by entity rather than kept in one monolithic file:
src/db/
connection.ts ← singleton, init, WAL mode
schema.ts ← CREATE TABLE statements (current state, for reference)
migrations/
index.ts ← runner: checks version, applies pending
001-initial.ts ← initial schema
002-pending-questions.ts ← example: adds pending_questions table
... ← skills append new numbered files
agent-groups.ts ← CRUD for agent_groups
messaging-groups.ts ← CRUD for messaging_groups + messaging_group_agents
sessions.ts ← CRUD for sessions + pending_questions
index.ts ← barrel: re-exports everything
Principles:
- Split by entity, not by layer. Each entity file has its own CRUD functions (~50-100 lines). A skill that adds a column to messaging_groups edits
messaging-groups.ts— doesn't touch sessions or agent groups. - Schema as current state + migrations as history.
schema.tsdocuments what the DB looks like now (read this to understand the schema). Migrations are append-only numbered files that describe how we got here. - No inline ALTER TABLE. A migration runner with a
schema_versiontable replacestry { ALTER TABLE } catch { /* exists */ }blocks. On startup, it checks the current version and applies pending migrations in order. Each migration is a function:(db: Database) => void. - Skills add migrations. A skill that needs a new column adds a new numbered migration file. No conflicts with other skills' migrations as long as numbers don't collide (use timestamps or high-enough numbers for skill branches).
Agent-runner session DB uses the same pattern but lighter — no migrations needed since session DBs are created fresh by the host:
container/agent-runner/src/db/
connection.ts ← open session.db at fixed path, WAL mode
messages-in.ts ← read pending, update status
messages-out.ts ← write results, outbox queries
index.ts ← barrel
These are the building blocks. None require special abstractions — they fall out of per-session DBs, host-managed routing, and messages_out with kind: 'system':
-
Multiple agent groups on the same channel with content-based routing. Different messages in the same thread can route to different agent groups based on content (e.g., @mention routes to supervisor, normal messages route to worker). The channel adapter's routing logic — custom code — decides.
-
Per-thread sessions from a shared agent group. Multiple sessions share the same agent group (filesystem, skills, CLAUDE.md) but each gets its own session DB. Standard for worker pools.
-
Session reset and replay. Create a new session for the same thread. Mark old messages as unhandled so the poll picks them up again. Old output stays visible in the platform (e.g., Discord thread) for comparison. This is an action an agent can request — not automatic.
-
Cross-session read access. Some agents can query other sessions' data. Different access levels: manager sees messages_in/messages_out (review content). Supervisor sees full internals (agent logs, tool calls, debug traces). This is just filesystem/DB access — mount or query the right paths.
-
Context duplication into new sessions. When a supervisor is invoked in a worker's thread, a new session is created with relevant messages copied in. Custom host-side code handles this.
-
Agent-initiated host actions. The agent uses MCP tools (reset session, update skills, etc.). The agent-runner handles the tool call and writes a structured
systemmessages_out row. The host reads and executes with permission checks. The agent can request, but the host decides.
Three agent groups, one Discord channel (PR Factory), plus an admin channel:
| Role | Agent Group | Where | Session model |
|---|---|---|---|
| Worker | pr-worker | PR Factory threads | One session per thread (per PR) |
| Manager | pr-manager | PR Factory channel | Single session, queries across worker sessions |
| Supervisor | pr-admin | Admin channel + PR Factory (when @tagged) | Main session in admin channel; per-thread session when invoked in worker threads |
Worker flow: GitHub PR → Discord thread → worker agent reviews (triage, review, test plan). Each thread gets a session from the shared pr-worker group.
Feedback flow: User @tags supervisor in worker threads → custom routing sends to supervisor with a new session containing the thread's messages (duplicated). Supervisor collects feedback to filesystem. Worker doesn't see supervisor messages.
Iteration flow: User discusses feedback with supervisor in admin channel → supervisor suggests skill changes (shown as rich card with diff) → user approves → supervisor applies changes via host action → supervisor requests session reset + replay → workers re-review same PRs with updated skills in same threads but fresh sessions → user compares reviews side by side.
Manager flow: User talks to manager in PR Factory main channel (not in threads). Manager can search across all worker session DBs (messages_in/messages_out) to answer questions like "how many PRs today?" or "what topics are trending?" Can request actions (close PR, re-open).
What's custom code vs. base architecture:
| Capability | Base architecture | Custom code (PR Factory) |
|---|---|---|
| Per-thread sessions | ✓ platformThreadId → session | |
| Shared agent group across sessions | ✓ Multiple sessions, one group | |
| Writing messages to session DB | ✓ Standard flow | |
| @mention routing to different agent | ✓ Channel adapter routing logic | |
| Context duplication into supervisor session | ✓ Host-side hook on supervisor invocation | |
| Session reset + replay | ✓ Primitives (new session, mark unhandled) | ✓ Supervisor action triggers it |
| Skill updates | ✓ Filesystem writes | ✓ Supervisor action applies changes |
| Cross-session queries | ✓ DB/filesystem access | ✓ Manager's tools know where to look |
| Rich card output | ✓ Structured output in messages_out |
The central DB handles routing and entity management. All content and execution state lives in per-session DBs.
-- Agent workspaces: folder, skills, CLAUDE.md, container config
CREATE TABLE agent_groups (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
folder TEXT NOT NULL UNIQUE,
agent_provider TEXT, -- default for sessions (null = system default)
container_config TEXT, -- JSON: { additionalMounts, timeout }
created_at TEXT NOT NULL
);
-- Platform groups/channels (WhatsApp group, Slack channel, Discord channel, email thread, etc.)
CREATE TABLE messaging_groups (
id TEXT PRIMARY KEY,
channel_type TEXT NOT NULL, -- 'whatsapp', 'slack', 'discord', 'telegram', 'email'
platform_id TEXT NOT NULL, -- platform-specific ID (JID, channel ID, etc.)
name TEXT,
is_group INTEGER DEFAULT 0,
unknown_sender_policy TEXT NOT NULL DEFAULT 'strict', -- 'strict' | 'request_approval' | 'public'
created_at TEXT NOT NULL,
UNIQUE(channel_type, platform_id)
);
-- Users (messaging platform identities, namespaced "<channel_type>:<handle>")
CREATE TABLE users (
id TEXT PRIMARY KEY, -- e.g. 'telegram:123456', 'discord:1470...'
kind TEXT NOT NULL, -- mirrors the channel_type prefix
display_name TEXT,
created_at TEXT NOT NULL
);
-- Roles (owner is global only; admin can be global or scoped to an agent_group)
CREATE TABLE user_roles (
user_id TEXT NOT NULL REFERENCES users(id),
role TEXT NOT NULL, -- 'owner' | 'admin'
agent_group_id TEXT REFERENCES agent_groups(id), -- NULL for global
granted_by TEXT,
granted_at TEXT NOT NULL,
PRIMARY KEY (user_id, role, agent_group_id)
);
-- owner rows must have agent_group_id = NULL (enforced in db/user-roles.ts)
-- Membership (explicit non-privileged access; admin/owner imply membership)
CREATE TABLE agent_group_members (
user_id TEXT NOT NULL REFERENCES users(id),
agent_group_id TEXT NOT NULL REFERENCES agent_groups(id),
added_by TEXT,
added_at TEXT NOT NULL,
PRIMARY KEY (user_id, agent_group_id)
);
-- DM resolution cache (so cold DMs aren't re-resolved every time)
CREATE TABLE user_dms (
user_id TEXT NOT NULL REFERENCES users(id),
channel_type TEXT NOT NULL,
messaging_group_id TEXT NOT NULL REFERENCES messaging_groups(id),
resolved_at TEXT NOT NULL,
PRIMARY KEY (user_id, channel_type)
);
-- Which agent groups handle which messaging groups, with what rules
CREATE TABLE messaging_group_agents (
id TEXT PRIMARY KEY,
messaging_group_id TEXT NOT NULL REFERENCES messaging_groups(id),
agent_group_id TEXT NOT NULL REFERENCES agent_groups(id),
trigger_rules TEXT, -- JSON: { pattern, mentionOnly, excludeSenders, includeSenders }
response_scope TEXT DEFAULT 'all', -- 'all' | 'triggered' | 'allowlisted'
session_mode TEXT DEFAULT 'shared', -- 'shared' | 'per-thread'
priority INTEGER DEFAULT 0, -- higher = checked first when multiple agents match
created_at TEXT NOT NULL,
UNIQUE(messaging_group_id, agent_group_id)
);
-- Sessions: one folder = one session = one container when running
-- Folder path is derived: sessions/{agent_group_id}/{session_id}/
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
agent_group_id TEXT NOT NULL REFERENCES agent_groups(id),
messaging_group_id TEXT REFERENCES messaging_groups(id), -- null for internal/spawned sessions
thread_id TEXT, -- platform thread ID (null for shared session mode)
agent_provider TEXT, -- override per session (null = inherit from agent_group)
status TEXT DEFAULT 'active', -- 'active' | 'closed'
container_status TEXT DEFAULT 'stopped', -- 'running' | 'idle' | 'stopped'
last_active TEXT, -- last message activity timestamp
created_at TEXT NOT NULL
);
CREATE INDEX idx_sessions_agent_group ON sessions(agent_group_id);
CREATE INDEX idx_sessions_lookup ON sessions(messaging_group_id, thread_id);
-- Pending interactive questions (cards waiting for user response)
-- Host writes when delivering a question card, deletes when response received
CREATE TABLE pending_questions (
question_id TEXT PRIMARY KEY,
session_id TEXT NOT NULL REFERENCES sessions(id),
message_out_id TEXT NOT NULL, -- the messages_out row that sent the card
platform_id TEXT, -- where the card was delivered
channel_type TEXT,
thread_id TEXT,
created_at TEXT NOT NULL
);When the host delivers a messages_out row with operation: 'ask_question':
- Host delivers the card via the channel adapter
- Host writes a
pending_questionsrow mappingquestion_id→session_id
When a Chat SDK ActionEvent (button click) arrives:
- Bridge extracts
actionIdfrom the event - Host looks up
pending_questionsbyquestion_id(derived from actionId — the bridge maintains the mapping) - Host finds the target session, writes a messages_in row with
questionId+selectedOption - Host deletes the
pending_questionsrow - Agent-runner picks up the messages_in row, matches to the pending tool call, returns the selection
This avoids scanning session DBs. The central DB is the routing lookup — same pattern as message routing.
Also used for host-generated approval cards: when the host sends an approval request to the admin's DM, it writes a pending_questions row. The admin's response is routed back to the originating session.
stopped → running → idle → stopped
↗
idle → running (new message while warm)
- stopped: No container. Swept at 60s for due scheduled messages.
- running: Actively processing. Polled at 1s for messages_out.
- idle: Done processing, container still warm (up to 30 min timeout). Polled at 1s so new messages are picked up quickly.
- After idle timeout → host kills container → stopped.
The agent-runner is the process inside the container. It mediates between the session DB and the Claude SDK — polling for work, formatting messages for the agent, translating tool calls into DB rows, and managing the agent lifecycle.
All IO goes through the session DB. No stdin, no stdout markers, no IPC files.
- Initial input and follow-ups: poll
messages_in - Output: write
messages_outrows - MCP tools: write DB rows (no IPC files)
- Shutdown: host kills the container on idle timeout, or the agent-runner exits when there's no pending work
- Query
messages_in WHERE status = 'pending' AND (process_after IS NULL OR process_after <= now()) - If rows found: set
status = 'processing',status_changed = now()on each - Batch messages into a single prompt (strip routing fields, format by kind)
- Push into Claude SDK's MessageStream
- Process agent output → write
messages_outrows - Set processed messages to
status = 'completed' - Back to step 1. If no messages found, sleep briefly and re-poll (container stays warm for idle timeout)
Agent-runner strips routing fields (platform_id, channel_type, thread_id) before formatting. The agent never sees routing info — it only sees content.
chat— format into<messages>XML blockchat-sdk— extract text, author, attachments from serialized message; format into<messages>XMLtask— format as[SCHEDULED TASK]prefix + prompt. Run pre-script if present.webhook— format as[WEBHOOK: source/event]+ JSON payloadsystem— host action results (e.g., "register_group succeeded"). Format as system context, not chat.
Mixed batches (e.g., a chat message + a system result both pending) are combined into one prompt with clear delimiters.
MCP tools write directly to the session DB.
Core tools:
| Tool | What it does |
|---|---|
send_message |
Write messages_out row, kind: 'chat' |
send_file |
Move file to outbox/{msg_id}/, write messages_out with filenames |
schedule_task |
Write messages_in row (to self) with process_after + recurrence. Or messages_out with deliver_after for outbound reminders. |
list_tasks |
Query messages_in WHERE recurrence IS NOT NULL |
pause_task / resume_task / cancel_task |
Modify messages_in rows (update status, clear/set recurrence) |
register_agent_group |
Write messages_out, kind: 'system', action: 'register_agent_group' |
New tools:
| Tool | What it does |
|---|---|
ask_user_question |
Write messages_out with question card. Hold tool call open, poll messages_in for response matching questionId. Return selection as tool result. |
edit_message |
Write messages_out with operation: 'edit' |
add_reaction |
Write messages_out with operation: 'reaction' |
send_to_agent |
Write messages_out with channel_type: 'agent', platform_id: '{target}' |
send_card |
Write messages_out with card structure |
See agent-runner-details.md for full MCP tool parameter definitions.
Agent-initiated (outbound): Tool-based. Agent calls ask_user_question (interactive card with options) or send_card (structured card). Agent-runner writes the card structure to messages_out. Host/adapter handles platform-specific rendering (Slack Block Kit, Discord embeds, Telegram inline keyboard, text fallback).
Host-initiated (approval cards): When an action requires approval, the host generates a standardized approval card and sends it to the admin's DM. These are not agent-initiated — the agent doesn't know about the approval step. The card format is fixed (action description + approve/deny buttons).
Inbound (card responses): Not a card — it's a messages_in row with questionId + selectedOption in the content. Agent-runner matches to the pending ask_user_question tool call and returns the selection as the tool result.
Messages starting with / are checked against three lists:
Whitelisted commands (pass-through to agent):
- Standard slash commands that the agent provider handles natively (e.g., Claude's built-in commands)
- Passed raw, no
<messages>XML wrapping
Admin-only commands (require admin sender):
/remote-control— remote control session/clear— clear session context/compact— force context compaction- If sent by a non-admin user, the command is rejected with an error message. Not forwarded to the agent.
Filtered commands (dropped entirely):
- Commands that don't make sense in the NanoClaw context or could cause issues
- Silently dropped — no error, no forwarding
The command lists are hardcoded in the agent-runner. Admin verification happens host-side before the message ever reaches the container: src/command-gate.ts queries user_roles (owner / global admin / scoped-admin-of-this-agent-group) and either passes the message through, drops it, or routes it elsewhere. The container has no notion of admin identity — no env var, no DB query, no per-message check.
The agent-runner processes recurring task messages like any other messages_in row. After the agent-runner marks a recurring message as completed, the host handles inserting the next occurrence (new messages_in row with process_after advanced to next cron time). The agent-runner doesn't manage recurrence — it just processes what it finds.
Pre-scripts: if a task message has a script field, run it first. If wakeAgent = false, mark completed without invoking Claude.
Outbound: Agent calls send_to_agent tool → agent-runner writes messages_out with channel_type: 'agent', platform_id = target agent group ID. Host validates permissions and writes to target session's messages_in.
Inbound: Messages from other agents arrive as normal chat messages_in rows. The content includes sender and senderId (e.g., "senderId": "agent:pr-admin"). No special formatting — the agent sees it as a chat message.
- AgentProvider interface wraps SDK-specific query logic (trunk ships the
claudeprovider; additional providers like OpenCode install via/add-<provider>skills) - Session resume via provider-specific mechanisms
- System prompt loading from CLAUDE.md files
- PreCompact hook for transcript archiving (Claude provider)
- Script execution for task-kind messages
- Approval routing — how does the host find the admin's DM conversation? What if no DM channel exists? Is the approval list configurable per agent group or global?
- MCP server lifecycle — does the MCP server process persist across multiple queries in the same container, or restart each time?
- Container startup config — what config (if any) is passed to the container at launch beyond env vars? The session DB is at a fixed mount path. System prompt comes from CLAUDE.md. Provider name comes from env. What else?
- Idle detection with pending questions — when
ask_user_questionis waiting for a response, the container should not be considered idle. Also need to detect when the agent is still working (active tool calls, subagents) and avoid killing the container even if no messages_out have been written recently.
- api-details.md — Channel adapter interface (NanoClaw + Chat SDK bridge), message content examples, host delivery logic
- agent-runner-details.md — AgentProvider interface, MCP tools, message formatting, media handling, provider implementations