Skip to content

chore(cluster): teardown P2P federation (Phase 4 — DO NOT MERGE)#307

Closed
floodsung wants to merge 36 commits into
mainfrom
chore/p2p-teardown
Closed

chore(cluster): teardown P2P federation (Phase 4 — DO NOT MERGE)#307
floodsung wants to merge 36 commits into
mainfrom
chore/p2p-teardown

Conversation

@floodsung

Copy link
Copy Markdown
Contributor

⚠️ DO NOT MERGE — REVIEW ONLY

This PR is intentionally kept in draft so Flood can review the full diff in the morning before any merge. It removes a substantial amount of code (~4,000 LOC) and the user wants to eyeball it before it lands.

Do not mark this PR ready-for-review or merge it without explicit go-ahead.


Summary

Phase 4 of the central-architecture pivot. The central memory + skill hub server (Phase 1) has already shipped as PR #304. This PR removes the entire P2P federation surface so the central server becomes the single cross-instance sync path going forward. Local single-instance MetaBot behavior is unchanged.

What goes away

Deleted modules

  • src/cluster/mdns.ts — LAN auto-discovery (_metabot._tcp.local)
  • src/cluster/peer-token.ts — stable reader-token loader (~/.metabot/peer-token)
  • src/api/peer-manager.tsPeerManager class + 30s polling + handshake + skill/memory caches
  • src/api/routes/peer-memory-routes.ts/api/peer-memory/*
  • src/api/routes/search-routes.ts — federated fan-out (/api/search/federated)

Deleted tests

  • tests/peer-manager.test.ts, tests/peer-token.test.ts, tests/mdns.test.ts
  • tests/federated-search.test.ts, tests/skill-hub-visibility.test.ts, tests/config-cluster.test.ts

Surgical edits (peer code stripped, files retained)

  • src/index.ts — drop PeerManager / mdns / peer-token bootstrap + shutdown
  • src/api/http-server.ts — drop peer-memory + search route mounts, drop /api/peer-handshake auth exemption, drop peerManager from ApiServerOptions / RouteContext / /api/health payload
  • src/api/routes/{bot,task,team,manifest,skill-hub}-routes.ts — strip peer routing, aggregation, install branches
  • src/api/routes/types.ts / index.ts — drop peerManager field + peer-route exports
  • src/web/ws-server.ts — drop peerManager param + peer-bot fallback in handleChat
  • src/memory/memory-server.ts — drop peerTokenLookup reader path (Pragmatic v1 peer-token gate)
  • src/config.ts — drop peers config (env vars + bots.json field)
  • bots.example.json — drop peers array
  • bin/mmmm search becomes local-only; peer-search / peer-get cases removed; help text trimmed
  • tests/memory-{proxy-auth,default-private,server}.test.ts — peer-token cases removed, local/admin cases preserved

Config + docs

  • .env.example — dropped MDNS_*, METABOT_PEERS, METABOT_PEER_*, METABOT_DISCOVERY_MODE, METABOT_CLUSTER_ID, METABOT_CLUSTER_SECRET, METABOT_DYNAMIC_PEER_DEMOTE_MS, METABOT_PEER_TOKEN_PATH. Left METABOT_CLUSTER_URL reserved for central mode.
  • README.md / README_EN.md / docs/internal/architecture.md / CLAUDE.md — replaced federation/mDNS sections with single-sentence pointers at docs/internal/central-architecture.md

What stayed

  • src/cluster/identity.ts — still used by config.ts for instance identity. clusterId / clusterUrl / discoveryMode fields remain in the InstanceIdentity struct but are no longer consumed for peer logic. A follow-up can remove them; not load-bearing right now.
  • src/api/routes/memory-proxy.ts — Phase 2 (metabot central client mode) will refactor this.
  • skill-hub-routes.ts visibility filter (visibilityFilterForRequest) and X-MetaBot-Origin handling — kept as defense-in-depth though no caller currently sends that header.

Verification

./node_modules/.bin/tsc --noEmit       # → 0 errors
./node_modules/.bin/eslint src/ tests/ # → 0 errors, 2 pre-existing warnings
./node_modules/.bin/vitest run         # → 329/330 passing

The one failing test is a pre-existing port-collision flake in central/tests/helpers.ts (introduced by PR #304). Different test files race on port 18200 because the let nextPort = 18200 counter is module-scoped — when vitest runs the central/tests/* files in parallel workers, each worker gets its own copy of the counter and they collide. The audit test passes in isolation. Not caused by this PR. Worth fixing in a separate small PR.

Test plan

  • Confirm the deletion list matches the central-architecture spec
  • Spot-check src/index.ts for any leftover peer references in the boot/shutdown path
  • Spot-check src/api/http-server.ts for any leftover peerManager plumbing
  • mm search still works against local memory after deploy
  • mb skills list still lists local skills
  • Discuss whether src/cluster/identity.ts peer-related fields can come out in a follow-up

🤖 Generated with Claude Code

floodsung and others added 30 commits May 15, 2026 01:46
* feat(installer): allow custom install directory via --dir / -Dir flag

install.sh and install.ps1 previously hardcoded the install path to
$HOME/metabot. Add a CLI flag (and matching PowerShell parameter) plus
an interactive prompt so users can install MetaBot anywhere.

- Priority: --dir / -Dir > METABOT_HOME env var > prompt > default.
- Tilde expansion + absolute-path validation; refuses to clobber
  $HOME / system roots.
- Persists METABOT_HOME to ~/.bashrc / ~/.zshrc (Linux/macOS) or
  user-level env (Windows) when non-default, so the mm/mb/metabot
  CLIs can locate the install in new shells.

* fix(codex): show model metadata in cards

* fix(codex): mirror skills and avoid bwrap sandbox

* fix(codex): tolerate agents deployment failures

* fix(codex): install bundled skills when user cache is empty

* docs: explain Codex skill migration

* feat(cluster): add federated identity foundation

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
* feat(installer): allow custom install directory via --dir / -Dir flag

install.sh and install.ps1 previously hardcoded the install path to
$HOME/metabot. Add a CLI flag (and matching PowerShell parameter) plus
an interactive prompt so users can install MetaBot anywhere.

- Priority: --dir / -Dir > METABOT_HOME env var > prompt > default.
- Tilde expansion + absolute-path validation; refuses to clobber
  $HOME / system roots.
- Persists METABOT_HOME to ~/.bashrc / ~/.zshrc (Linux/macOS) or
  user-level env (Windows) when non-default, so the mm/mb/metabot
  CLIs can locate the install in new shells.

* fix(codex): show model metadata in cards

* fix(codex): mirror skills and avoid bwrap sandbox

* fix(codex): tolerate agents deployment failures

* fix(codex): install bundled skills when user cache is empty

* docs: explain Codex skill migration

* feat(memory): add namespace scoped instance token

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
* feat(installer): allow custom install directory via --dir / -Dir flag

install.sh and install.ps1 previously hardcoded the install path to
$HOME/metabot. Add a CLI flag (and matching PowerShell parameter) plus
an interactive prompt so users can install MetaBot anywhere.

- Priority: --dir / -Dir > METABOT_HOME env var > prompt > default.
- Tilde expansion + absolute-path validation; refuses to clobber
  $HOME / system roots.
- Persists METABOT_HOME to ~/.bashrc / ~/.zshrc (Linux/macOS) or
  user-level env (Windows) when non-default, so the mm/mb/metabot
  CLIs can locate the install in new shells.

* fix(codex): show model metadata in cards

* fix(codex): mirror skills and avoid bwrap sandbox

* fix(codex): tolerate agents deployment failures

* fix(codex): install bundled skills when user cache is empty

* docs: explain Codex skill migration

* feat(skills): track owner metadata and hashes

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
* feat(installer): allow custom install directory via --dir / -Dir flag

install.sh and install.ps1 previously hardcoded the install path to
$HOME/metabot. Add a CLI flag (and matching PowerShell parameter) plus
an interactive prompt so users can install MetaBot anywhere.

- Priority: --dir / -Dir > METABOT_HOME env var > prompt > default.
- Tilde expansion + absolute-path validation; refuses to clobber
  $HOME / system roots.
- Persists METABOT_HOME to ~/.bashrc / ~/.zshrc (Linux/macOS) or
  user-level env (Windows) when non-default, so the mm/mb/metabot
  CLIs can locate the install in new shells.

* fix(codex): show model metadata in cards

* fix(codex): mirror skills and avoid bwrap sandbox

* fix(codex): tolerate agents deployment failures

* fix(codex): install bundled skills when user cache is empty

* docs: explain Codex skill migration

* feat(cluster): bootstrap peers from cluster url

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
* feat(installer): allow custom install directory via --dir / -Dir flag

install.sh and install.ps1 previously hardcoded the install path to
$HOME/metabot. Add a CLI flag (and matching PowerShell parameter) plus
an interactive prompt so users can install MetaBot anywhere.

- Priority: --dir / -Dir > METABOT_HOME env var > prompt > default.
- Tilde expansion + absolute-path validation; refuses to clobber
  $HOME / system roots.
- Persists METABOT_HOME to ~/.bashrc / ~/.zshrc (Linux/macOS) or
  user-level env (Windows) when non-default, so the mm/mb/metabot
  CLIs can locate the install in new shells.

* fix(codex): show model metadata in cards

* fix(codex): mirror skills and avoid bwrap sandbox

* fix(codex): tolerate agents deployment failures

* fix(codex): install bundled skills when user cache is empty

* docs: explain Codex skill migration

* docs: document federated memory and skill hub

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
* feat(installer): allow custom install directory via --dir / -Dir flag

install.sh and install.ps1 previously hardcoded the install path to
$HOME/metabot. Add a CLI flag (and matching PowerShell parameter) plus
an interactive prompt so users can install MetaBot anywhere.

- Priority: --dir / -Dir > METABOT_HOME env var > prompt > default.
- Tilde expansion + absolute-path validation; refuses to clobber
  $HOME / system roots.
- Persists METABOT_HOME to ~/.bashrc / ~/.zshrc (Linux/macOS) or
  user-level env (Windows) when non-default, so the mm/mb/metabot
  CLIs can locate the install in new shells.

* fix(codex): show model metadata in cards

* fix(codex): mirror skills and avoid bwrap sandbox

* fix(codex): tolerate agents deployment failures

* fix(codex): install bundled skills when user cache is empty

* docs: explain Codex skill migration

* fix(cli): use instance memory token

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
* feat(installer): allow custom install directory via --dir / -Dir flag

install.sh and install.ps1 previously hardcoded the install path to
$HOME/metabot. Add a CLI flag (and matching PowerShell parameter) plus
an interactive prompt so users can install MetaBot anywhere.

- Priority: --dir / -Dir > METABOT_HOME env var > prompt > default.
- Tilde expansion + absolute-path validation; refuses to clobber
  $HOME / system roots.
- Persists METABOT_HOME to ~/.bashrc / ~/.zshrc (Linux/macOS) or
  user-level env (Windows) when non-default, so the mm/mb/metabot
  CLIs can locate the install in new shells.

* fix(codex): show model metadata in cards

* fix(codex): mirror skills and avoid bwrap sandbox

* fix(codex): tolerate agents deployment failures

* fix(codex): install bundled skills when user cache is empty

* docs: explain Codex skill migration

* Cache peer Skill Hub artifacts

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
* Mirror peer MetaMemory documents

* Add stable bot memory namespaces

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
* Mirror peer MetaMemory documents

* Add stable bot memory namespaces

* Update lark CLI via metabot update

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
* Mirror peer MetaMemory documents

* Add stable bot memory namespaces

* Sharpen metabot skill: lead with mb talk for cross-bot messaging

The previous frontmatter description buried `mb talk` behind generic phrasing
("agent collaboration") that mixed it with bot management, skills publishing,
and voice calls. Result: the main agent rarely reached for it when the user
said things like "跟其他 bot 说话" or "delegate to bot X".

Rewrites the description to lead with the talk use case (with CN+EN trigger
phrases) and adds a Quickstart at the top of the body that shows the one-line
local + cross-peer commands, points out the async semantics, and explicitly
distinguishes from Agent Teams' `SendMessage`. Other features (bots/peers/
voice/skill-hub/scheduling/API) move below the Quickstart.

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
…#295)

The Claude Agent SDK's AskUserQuestionOutput schema (sdk-tools.d.ts)
documents the `answers` dict as "question text -> answer string". The
SDK's tool_result template looks up `answers[question.question]`; keying
by `header` produces a structurally-valid dict that the hook happily
forwards to the SDK, but the rendered tool_result text comes out as

  "User has answered your questions: . You can now continue..."

— answers list empty, the model has no idea what the user picked, the
turn is wasted. Observed end-to-end: bridge log showed the resolver
unblocking with 3 correctly-collected answers, but the SDK transcript
recorded `tool_result.content` with no interpolated values (while the
structured `toolUseResult.answers` carried them through).

Fix four call sites to key by `question.question`:

  - handleAnswer (in-turn answer collection)
  - tryHandleBetweenTurnQuestionReply (firstQ + remaining-fill)
  - autoAnswerRemainingQuestions (timeout fallback)
  - tests/between-turn-question.test.ts (the prior assertion encoded
    the same bug, so the regression slipped past unit tests)

Verified: 315/315 vitest pass, `tsc` clean, lint clean for changed
files, plus end-to-end retest in Feishu confirmed the model now sees
the user's selections instead of an empty list.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Mirror peer MetaMemory documents

* Add stable bot memory namespaces

* feat(cluster): mDNS LAN auto-discovery for MetaBot peers

Two MetaBot instances on the same LAN now discover each other within ~30s
of startup with zero `.env` configuration. Each instance advertises
`_metabot._tcp.local` with a TXT record carrying instanceId, instanceName,
optional clusterId, and a SHA-256 fingerprint of the Ed25519 public key —
no secrets on the wire.

- `src/cluster/mdns.ts`: thin wrapper over `bonjour-service` with an
  injection seam for tests. Honors discoveryMode: `auto` (default;
  advertise+browse), `static` (browse only), `standalone`/`off` (no-op).
  Self-filter by instanceId; optional clusterFilter drops cross-cluster
  peers.
- `PeerManager`: new `source` field on peers (`static|cluster|mdns|manual`)
  plus `addDynamicPeer` / `removeDynamicPeer`. Static peers win on URL
  collision so pre-configured secrets keep working; the poll timer is
  lazily armed when the first dynamic peer arrives.
- `src/index.ts`: wires startMdns into startup (always builds PeerManager
  now so mDNS peers have somewhere to land) and graceful shutdown.
- Docs (`docs/features/peers.md` + `.zh.md`, README.md) describe the new
  flow, the manual two-machine verification, `METABOT_DISCOVERY_MODE`
  and `METABOT_MDNS_ENABLED` knobs.

Stage 1 of 3 from `docs/internal/lan-shared-status-audit.md` §5.
ACL changes, token auto-issuance, and federated `mm search` stay in
Stages 2 and 3.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* Mirror peer MetaMemory documents

* Add stable bot memory namespaces

* feat(peers): auto reader-token handshake (Pragmatic v1)

Two fresh-install MetaBots on the same LAN can now reach each other's
shared memory and skills without operators exchanging secrets. Each
instance generates a stable reader token at ~/.metabot/peer-token on
first start; once peers discover each other (mDNS or manual add), they
exchange tokens through POST /api/peer-handshake and cache the reply
as the secret for outbound calls.

Pragmatic v1 scope — the inbound peer token is resolved by memory-server
to a reader principal scoped to the peer's instanceId. Read ACL is still
gated by folder visibility (default 'shared'); a principal+grants read
gate is deferred to Phase 7 of the federated memory plan. Three deferred
triggers documented: namespace-level revoke, cross-VLAN trust, audit
needing per-grant attribution.

Also in this PR:
- Dynamic peer demote: mDNS/cluster/manual peers continuously unhealthy
  for 5 min (METABOT_DYNAMIC_PEER_DEMOTE_MS to tune) are dropped from
  the in-memory registry. Static peers are untouched.
- install.sh writes MEMORY_INSTANCE_TOKEN on fresh install and
  idempotently backfills it on existing .env files.
- Cluster ID onboarding tip and dynamic-demote knob documented in
  docs/features/peers.{md,zh.md}.
- canReadFolder / hasNamespaceGrant logic unchanged; a verbatim
  Pragmatic v1 marker comment was added to the read branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…298)

* Mirror peer MetaMemory documents

* Add stable bot memory namespaces

* fix(memory-proxy): preserve inbound Authorization header for peer-token reads

The bridge's /memory/* reverse proxy was unconditionally rewriting the
caller's Authorization header to the local admin token before forwarding
to memory-server. This silently bypassed the Pragmatic v1 folder-
visibility gate for cross-instance traffic: when peer-A hit peer-B's
/memory/api/* with B's reader token (from the Stage 2 handshake), B's
bridge replaced it with B's own admin token, and memory-server returned
the unfiltered admin view — peers effectively had admin reads on the
other instance.

Fix preserves the inbound Authorization verbatim when present and only
falls back to memoryAuthToken when no header is supplied (web UI use
case). Proxy logic moved into src/api/routes/memory-proxy.ts so the
behaviour can be regression-tested directly.

New tests/memory-proxy-auth.test.ts spins up a memory-server and a
proxy server, then proves end-to-end that:
  1. A peer reader token round-trips through the proxy and folder
     visibility filters out a private folder + its docs from both the
     folder tree and search results.
  2. An unknown token returns 401 even with admin fallback configured.
  3. No-Authorization requests still receive the admin fallback (web
     UI compatibility).

Affects PR #297 (Stage 2 auto reader-token handshake) — Pragmatic v1
was demonstrated only by in-process tests there; this PR makes the
cross-instance proxy path uphold the same contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* Mirror peer MetaMemory documents

* Add stable bot memory namespaces

* feat(search): federated mm search across local + live peers + cache-stale

Stage 3 of the LAN-shared-access initiative. mm search now fans out
server-side via a new GET /api/search/federated endpoint on the bridge
and returns a single merged payload, with each hit tagged
source=local|peer|cache-stale.

- src/api/routes/search-routes.ts: orchestrates local memory-server +
  live peers (only those we hold a reader token for, via Stage 2's
  handshake or operator-configured secret) + cache-stale entries.
- src/api/peer-manager.ts: adds getLivePeersWithSecret() so the fan-out
  knows which peers to hit and with what token.
- Dedup by peerName: if a peer responds live (even with zero hits), its
  stale cache entries are suppressed so the operator doesn't see dupes.
  Filter is by peerName, not URL — peers can rotate URLs.
- bin/mm search: now hits the bridge's federated endpoint; when
  METABOT_URL is unreachable, falls back to META_MEMORY_URL/api/search
  (local-only) and prints a one-line stderr warning.
- bin/mm peer-search: kept as a cache-only inspector for debugging.
- Pragmatic v1 ACL untouched — folder-visibility default + token gate.
  Read side reuses /memory proxy whose Authorization passthrough was
  fixed in PR #298.
- Audit doc: flip Top-3 #3 status to DONE.

Tests: 4 new tests in tests/federated-search.test.ts cover the merge,
the dedup-by-peerName logic, the standalone-deployment fallthrough, and
the per-peer error reporting. Suite stays green at 352/352.

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
…nce peers (#300)

* Mirror peer MetaMemory documents

* Add stable bot memory namespaces

* fix(skill-hub): respect visibility on list/search/get for cross-instance peers

Skills already carry a `visibility: private | published | shared` field
on publish, but the read paths (`store.list`, `store.search`, and the
`GET /api/skills/:name` route) never filtered on it. A cross-instance
peer authenticated with the shared API_SECRET could enumerate and fetch
skills marked `private` by their owner — a real data leak across the
new federated + central-archive deploy shape.

Changes:
- `SkillHubStore.list/search` now accept an optional `visibility[]`
  allow-list and add a `WHERE visibility IN (...)` clause when set.
- `skill-hub-routes` derives the allow-list from `X-MetaBot-Origin: peer`:
  local admin (no header) sees everything; peers see only `published`
  and `shared`. The single-skill GET applies the same gate so guessing
  the name does not bypass the list filter.
- `POST /api/skills/:name/install` is unchanged — peers install onto
  *their* bots via fetchPeerSkill, which hits the now-gated GET above,
  so published/shared installs continue to work while private skills
  are 404.
- New regression tests at `tests/skill-hub-visibility.test.ts` cover
  store-level filtering, HTTP list/search, and the install-path 404.

* fix(skill-hub): drop stray audit-log imports

A concurrent worktree's audit-log scaffolding leaked into the previous
commit on this branch (`AuditOp` import + `deriveSkillOp` / audit hook).
That work belongs on `feat/audit-log-jsonl`, not this PR. CI flagged it
as a missing module on tsc. Removing here so the visibility filter
ships standalone.

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
…302)

* feat(memory): default folder visibility = private + mm share opt-in

Phase 0 of the central-architecture pivot — tighten P2P defaults so a
new folder is not immediately visible to every peer in the LAN federation.

Behavior change:
- `POST /api/folders` now defaults `visibility` to `'private'` instead of
  `'shared'`. Cross-instance peer-token readers cannot see private
  folders (or their documents) until an admin flips them via
  `PUT /api/folders/:id { visibility: 'shared' }` or `mm share <path|id>`.
- `mm share` / `mm unshare` CLI commands added (accept UUID or path).
- Local admin and instance-token (own-namespace) access unaffected.
- Existing folders keep their current `visibility` — migration is
  metadata-only; nothing in the DB is rewritten.

Why now: insider-exfil threat model. Central architecture (Phase 1+)
removes peer-token visibility entirely, but until that ships every new
folder defaulting to `shared` is too loud — one slip and a private
project is readable across the whole LAN federation.

Tests:
- tests/memory-default-private.test.ts — 7 new cases covering default,
  explicit shared, mm share/unshare, peer-token visibility, admin override.
- tests/memory-server.test.ts — updated to mark shared-test folders
  explicitly.
- tests/federated-search.test.ts — opts into `shared` for the
  cross-instance fan-out scenario it exercises.

Docs:
- README.md: Phase 0 default-private callout + mm share usage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: surface Phase 0 default-private folders in CLAUDE.md and README_EN.md

Mirror the loud breaking-change callout already in README.md so English
docs + working-mode guidance for Claude both flag the new default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Flood Sung and others added 6 commits May 17, 2026 15:45
…(Phase 0) (#303)

Append-only daily audit log capturing every memory + skill-hub request:
who (principal), what (op), where (path), from (sourceIp), result (status),
how long (latencyMs). JSON-lines file under `data/audit/YYYY-MM-DD.jsonl`,
size-capped with rotation (`<date>.1.jsonl`, `.2.jsonl`, ...).

Why now: insider-exfil threat model. Central architecture (Phase 1+) will
reuse this same `AuditLog` class server-side; landing the foundation in
Phase 0 means we already have a record-of-truth for the interim P2P
window AND the migration period.

What's logged:
- Memory: list, read, create, update, delete, search
- Skill hub: list, get, search, publish, install, delete
- Principal: 'admin' for local admin token; instanceId for federated peers;
  `peer:<name>` for skill-hub cross-instance reads
- Source IP: X-Forwarded-For first hop, falling back to socket remote

Operator UX:
- `mm audit <YYYY-MM-DD> [--filter principal=X] [--filter op=X]`
  — pretty-prints accessible entries from the JSONL file
- METABOT_AUDIT_ENABLED=false to disable (default on)
- METABOT_AUDIT_DIR overrides location

Tests:
- 5 cases in tests/audit-log.test.ts covering append, multiline read,
  date rotation, size rotation, disabled mode
- Existing memory-server tests still green (9/9)

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* Mirror peer MetaMemory documents

* Add stable bot memory namespaces

* Phase 1: central server module

New top-level `central/` module — Node 20 + better-sqlite3 server that
serves as the centralized memory + skill hub MetaBot is pivoting to
(see docs/internal/central-architecture.md).

Adapted from src/memory/memory-{storage,server,routes}.ts and
src/api/skill-hub-store.ts but with a path-based ACL keyed to a new
Credential model, separate npm package, and self-contained deploy/.

What's in this module:

- src/auth/             Credential model + SQLite store + middleware
                        - sha256 token hashing, 60s lookup cache
                        - deferred lastUsedAt batch writes
                        - admin bootstrap (one-time token → data/admin-bootstrap-token.txt 0600)
- src/memory/           Folders + documents + FTS5 search
                        - path-based ACL (canRead/canWrite per Credential)
                        - create-by-path auto-creates intermediate folders
- src/skills/           Publish/list/search/delete + publish-acl
                        - member requires publishSkill: true
                        - delete is admin-only
- src/admin/            issue/revoke/list creds + audit query routes + CLI
- src/observability/    JSONL daily audit log (rotates at 100 MB)
- src/server.ts         HTTP routing + auth on every /api or /admin request
- bin/central-admin     CLI shim → dist/admin/admin-cli.js
- Dockerfile            multi-stage Node 20 slim, drops to `node` user
- docker-compose.yml    central + optional Caddy profile
- deploy/install.sh     idempotent Ubuntu 22.04 installer (Node + Caddy + systemd)
- deploy/Caddyfile      TLS terminator template
- deploy/central.service systemd unit, hardened

Tests (21 pass):
- tests/auth.test.ts    issue/revoke/lookup/cache/bootstrap
- tests/memory.test.ts  CRUD + namespace ACL (admin vs member isolation)
- tests/skills.test.ts  publish-acl + visibility filter + remove
- tests/audit.test.ts   every authed request logged JSONL
- tests/e2e.test.ts     end-to-end over HTTP: bootstrap → issue member →
                        member writes own ns / 403 on /shared and other
                        users / revoke → 401 credential_revoked

Scope notes:
- Does NOT modify anything under src/ of the main metabot (per spec)
- No deployment performed — gated on getting an ECS from trunks
- Phase 2 (metabot client mode pointing at central) is a follow-up PR
- Phase 4 (P2P teardown) is a separate follow-up PR

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Flood Sung <floodsung@xvirobotics.ai>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…eview)

Phase 4 of the central-architecture pivot. Removes the entire P2P
federation surface so the central server (Phase 1, already merged as
PR #304) becomes the single cross-instance sync path. Local single-instance
MetaBot behavior is unchanged.

DELETED
  src/cluster/mdns.ts                 — LAN auto-discovery
  src/cluster/peer-token.ts           — stable reader-token loader
  src/api/peer-manager.ts             — PeerManager + handshake + caches
  src/api/routes/peer-memory-routes.ts
  src/api/routes/search-routes.ts     — federated fan-out
  tests/peer-manager.test.ts
  tests/peer-token.test.ts
  tests/mdns.test.ts
  tests/federated-search.test.ts
  tests/skill-hub-visibility.test.ts
  tests/config-cluster.test.ts

EDITED
  src/index.ts                        — removed PeerManager / mdns / peer-token bootstrap
  src/api/http-server.ts              — removed peer-memory/search route mounts, peer-handshake
                                        auth exemption, peerManager wiring, /api/peers fields
  src/api/routes/{bot,task,team,manifest,skill-hub}-routes.ts
                                        — stripped peer routing / aggregation / install branches
  src/api/routes/types.ts             — dropped peerManager from RouteContext
  src/api/routes/index.ts             — dropped peer-memory + search exports
  src/web/ws-server.ts                — removed peerManager param + peer-bot fallback
  src/memory/memory-server.ts         — removed peerTokenLookup reader path
  src/memory/memory-storage.ts        — stale-comment cleanup
  src/config.ts                       — removed peers config (env vars + bots.json field)
  bots.example.json                   — removed `peers` array
  bin/mm                              — `mm search` now local-only; peer-search/peer-get gone
  tests/memory-{proxy-auth,default-private,server}.test.ts
                                        — peer-token cases removed, local/admin cases kept
  .env.example                        — dropped MDNS_/METABOT_PEER_/METABOT_DISCOVERY_/
                                        METABOT_CLUSTER_{ID,SECRET}/METABOT_PEER_TOKEN_PATH/
                                        METABOT_DYNAMIC_PEER_DEMOTE_MS rows. Left
                                        METABOT_CLUSTER_URL reserved for central mode.
  README.md / README_EN.md / docs/internal/architecture.md / CLAUDE.md
                                        — pointed federation/mDNS sections at
                                        docs/internal/central-architecture.md instead.

NOT TOUCHED
  src/cluster/identity.ts             — still used by config.ts for instance identity
                                        (clusterId/discoveryMode fields remain in the
                                        struct, unused for peer logic; future cleanup)
  src/api/routes/memory-proxy.ts      — Phase 2 (central client mode) will refactor this

VERIFICATION
  ./node_modules/.bin/tsc --noEmit    → clean (0 errors)
  ./node_modules/.bin/eslint src/ tests/ → 0 errors, 2 unrelated warnings (pre-existing)
  ./node_modules/.bin/vitest run      → 329/330 passing; single failure is a
                                        pre-existing port-collision flake in
                                        central/tests/helpers.ts (port 18200 race when
                                        vitest worker processes share the module-scoped
                                        counter — introduced by PR #304, not by this PR;
                                        passes in isolation).
@floodsung

Copy link
Copy Markdown
Contributor Author

Closed: superseded by main history rewrite. 2026-05-18 we reset main back to b988be1 (pre-P2P) and cherry-picked the 3 non-P2P keepers (#293/#294/#295). P2P federation code no longer exists on main, so a teardown PR is moot. Pre-revert state archived at tag archive/p2p-era-end-20260518-main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants