Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.
This repository was archived by the owner on May 26, 2026. It is now read-only.

Feature: Runtime provisioning of agents (add/update/remove without restart) #694

@ezra-letta

Description

@ezra-letta

Problem

Multi-agent LettaBot deployments load the agents: array from config once at startup. There is no runtime mechanism to add, update, or remove an agent while the server is running — any change to the agent roster requires editing yaml and restarting the process.

For operators provisioning agents programmatically (per-tenant, per-user, dynamic workloads), this forces a full process restart per provisioning event. Restarts are not destructive (state persists via lettabot-agent.json, credentials, and the Letta server), but they interrupt in-flight work, break open WebSocket/Socket-Mode connections to channel providers, and make programmatic provisioning awkward.

Current behavior (source references)

  • src/main.ts iterates agents: once and calls gateway.addAgent(name, session) per entry. After startup the gateway is effectively frozen.
  • LettaGateway.addAgent() (src/core/gateway.ts) is only invoked during boot. There is no corresponding removeAgent() called from the runtime, no updateAgent(), and no public entry point that would rehydrate from config.
  • src/api/server.ts exposes /api/v1/messages, /api/v1/chat, /api/v1/chat/async, /api/v1/conversation, /api/v1/conversations, /api/v1/pairing/:channel, /api/v1/status, /v1/models, and /v1/chat/completions. Nothing for agent lifecycle.
  • No SIGHUP handler, no yaml file watcher for the agents array, no lettabot reload CLI command.
  • The only reload-like mechanism in the codebase is per-adapter: reloadConfigAt in src/channels/bluesky/adapter.ts refreshes a Bluesky agent's feed config via a runtime-state JSON file. It is scoped to a specific adapter's feed settings, not agent lifecycle, and does not generalize to add/remove.

Proposed approach

Primary: a REST API surface for agent lifecycle, consistent with the existing /api/v1/* routes and protected by the same auth.

POST   /api/v1/agents              # add a new agent from inline config
GET    /api/v1/agents              # list currently loaded agents
GET    /api/v1/agents/:name        # describe one agent
PATCH  /api/v1/agents/:name        # update channels/features/conversations
DELETE /api/v1/agents/:name        # deregister and tear down channel adapters
POST   /api/v1/agents/reload       # re-read config file and diff against live state

Request body shape should mirror the existing agents: array element so config can be shuttled between yaml and API without transformation.

Lifecycle semantics that need to be correct:

  • Channel adapters (Telegram, Slack, Discord, Signal, WhatsApp, Bluesky) must be started/stopped cleanly, including graceful disconnect for stateful adapters.
  • lettabot-agent.json must be updated in place so added agents persist across restarts.
  • Pairing/credential files under ~/.lettabot/credentials/ for the new agent's channels must be created on first pairing as today.
  • Cron jobs scoped per agent (CRON_STORE_PATH) must be reinitialized when adding, and drained/cancelled when removing.
  • Heartbeat loops must start/stop with the agent.
  • The LETTABOT_CONFIG_YAML path (Docker/cloud deploys) should be reconcilable with runtime state — either "yaml is the frozen boot-time declaration, runtime API layers on top" or "yaml is the source of truth and the reload endpoint diffs yaml against live state." Either is defensible; the behavior needs to be documented.

Secondary (lower priority, optional): a CLI equivalent (lettabot agents add|remove|list|reload) that wraps the API for ops workflows.

Use cases

  • Programmatic provisioning of agents per tenant or per user from a control plane.
  • Onboarding flows where a user request creates a new agent, attaches channels, and makes it live without operator intervention.
  • Rolling out or rolling back a single agent without affecting the rest of a multi-agent fleet.
  • CI/CD pipelines that declare the agent roster in yaml and can trigger a reload after deploy instead of a full process restart.

Related

Reported by

Surfaced in Discord by gerwitz (self-hosted operator provisioning agents at runtime). Current workaround is "edit yaml, restart LettaBot" under a process manager with fast restart (systemd, pm2, Docker restart policy). Restart is non-destructive but disruptive for programmatic provisioning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions