Skip to content

test(botfather): guard daemon onboarding never emits OCTO_MATTER_URL (#362)#405

Open
lml2468 wants to merge 37 commits into
mainfrom
fix/362-onboarding-no-matter-url
Open

test(botfather): guard daemon onboarding never emits OCTO_MATTER_URL (#362)#405
lml2468 wants to merge 37 commits into
mainfrom
fix/362-onboarding-no-matter-url

Conversation

@lml2468

@lml2468 lml2468 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #362.

The BotFather /daemon onboarding flow (now GET /v1/runtime-onboarding) historically instructed users to export OCTO_MATTER_URL, but the octo-daemon binary never reads that env — it reaches matter task ack/pull through the fleet/server endpoints. Setting it was a no-op that caused bot-provisioning friction.

The current code on this branch already omits OCTO_MATTER_URL from the runnable command block and env_vars, but nothing guarded the contract against reintroduction.

Changes

  • Extract the inline command/env construction into two pure functions: buildDaemonStartBlock(serverURL, fleetURL, apiKey) and daemonEnvVars(serverURL, fleetURL), plus a daemonInstallCommand const. The "why no MATTER_URL" rationale now lives on the builder doc.
  • Add hermetic regression tests (api_runtime_onboarding_matter_url_test.go) asserting the daemon-facing instructions expose exactly OCTO_SERVER_URL + OCTO_FLEET_URL and never OCTO_MATTER_URL.

Contract / compatibility

  • No behavior change. Response shape (runtimeOnboardingResp) is unchanged — matter_url is still returned as an informational field; it is simply not part of the runnable commands.start block or env_vars the daemon consumes.
  • No REST/WS contract break.

Test plan

  • go build ./modules/botfather/
  • go vet ./modules/botfather/
  • go test ./modules/botfather/ -run OmitsMatterURL -v (both new tests PASS)

Part of OCT-10 (golden-path server-side hardening).

caster-Q and others added 30 commits May 18, 2026 19:30
…time

迁移 dmworkim 老仓 feat/agent-runtime 的 runtime 模块到 octo-server,
按简化版策略只做编译必需的最小适配:

- modules/runtime/ 12 文件(api/db/model/enrich/upgrade/version_sync/1module + 4 SQL)
- import 路径:dmwork-org/{dmworkim,dmwork-lib} → Mininglamp-OSS/{octo-server,octo-lib}
- /daemon 命令安装 URL → Mininglamp-OSS/octo-daemon-cli
- internal/modules.go 注册 runtime 模块
- modules/botfather/{const,command,api}.go surgical apply CmdDaemon + handleDaemon

业务字面(channel id "dmwork"、route prefix "dmwork/<uid>"、log/注释里的 dmwork)
有意保留,等本地端到端测试阶段统一修(与 openclaw-channel-octo channel id 改名对齐)。

Build: go build ./... 通过
Tests: enrich_test 等含 dmwork 业务字面的测试会挂,留给测试阶段
Match the daemon-cli and openclaw-channel-octo changes:

- enrich.go: channel id 'dmwork' → 'octo' (route prefix matching for
  bot info JOIN in user × robot × space_member); the rename function
  collectDmworkUIDs internal name kept as-is for now
- enrich_test.go: test data updated
- api.go: plugin_has_update match on 'octo' (was hardcoded
  'openclaw-channel-dmwork')
- upgrade.go: const componentPlugin = 'octo'
- model.go + sql migration comment: update doc strings

Verified end-to-end locally:
- go test ./modules/runtime/... pass
- Runtimes UI shows Octo Plugin 1.0.7 → 1.0.8 upgrade hint
- Bot route enrich: avatar/name/online dot rendered correctly
- Plugin upgrade flow (npx -y openclaw-channel-octo@dev install) end-to-end
  completed in 86 s with task=completed, daemon-reported version 1.0.8
- adapters/claude-code-dmwork/SKILL.md: install command uses ClawHub plugin id `octo` (was `openclaw-channel-dmwork`)
- docs/2026-05-mention-all-chokepoint-audit.md: source-tree path references updated from `openclaw-channel-dmwork/src/inbound.ts` to `octo/src/inbound.ts`
- docs/changelog.md: package-name reference in V2 conflict-fix note updated

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two heartbeat-pull dispatch pipelines under one schema:

managed_runtime_agent — Web "Create agent + Add bot" UX (PoC1).
Creates an openclaw agent workspace on the daemon and mints a bot
in the same flow; status: provisioning → bot_minted → dispatched →
active. Routes:
  POST /v1/runtimes/managed-agents
  POST /v1/runtimes/:runtime_id/agents/:agent_id/bots
  POST /v1/daemon/managed-agents/:id/ack
botfather.MintBotOBO extracted from /newbot handler so non-IM
callers can mint bots on behalf of a user.

bot_task — matter @mention/assignee → agent dispatch (PoC2+3).
Resolves bot_uid → openclaw agent_id via managed_runtime_agent,
queues a row, daemon claims via heartbeat. Routes:
  POST /v1/internal/bot-tasks      (X-Internal-Token, called by matter)
  POST /v1/daemon/bot-tasks/:id/ack
ack handler writes back to matter:
  - timeline (bot reply content)
  - activity (agent_task_completed/failed with elapsed_ms + bytes)
createBotTaskReq.prompt is an optional override so mention dispatch
can supply a full continuation prompt with conversation history.

heartbeat handler claims both pending_command (managed_agent) and
pending_task (bot_task) per cycle.

tsdd.yaml: enable MinIO for local dev (admin/12345678 against
testenv-minio-1) so avatar/file uploads work — production should
set proper IAM creds via env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for cross-service trust chain laid out in spec
docs/superpowers/specs/2026-06-01-octo-fleet-extraction-design.md.

octo-server is now the canonical JWT issuer; fleet/matter will fetch
the public key from /.well-known/jwks.json and verify tokens locally
without service-to-service calls.

Implementation:
- modules/auth_jwt/jwt.go: RSA-2048 keypair loaded from or generated
  into ~/.octo-server/jwt-priv.pem (override via JWT_PRIVATE_KEY_PATH).
  kid derived from last-8-bytes of modulus so it stays stable across
  restarts — JWKS consumers can cache long-term.
- modules/auth_jwt/resolve.go: token exchange has two paths — web
  session via existing cache lookup, daemon api_key via user_api_key
  table + space_member membership re-check.
- POST /v1/auth/token returns {token, expires_in, scope}; web=30min,
  daemon=30days.
- GET /.well-known/jwks.json (Cache-Control: public, max-age=300).

Verified locally: api-key exchange produces a valid RS256 token with
the expected claims (sub=uid, scope=daemon, space_id, daemon_id, exp).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-A.2 cross-service contract:

  POST /v1/bot/mint        — web session auth; wraps botfather.MintBotOBO
                             and returns ONLY {bot_uid}. bot_token stays
                             in server's robot.bot_token column.
  GET  /v1/bot/:uid/token  — daemon JWT auth (scope=daemon); returns
                             {bot_token, bot_uid} after asserting
                             robot.creator_uid == JWT.sub.

Together these let octo-fleet orchestrate bot lifecycle without ever
seeing bot_token: web mints (browser knows uid, not token), web tells
fleet about bot_uid, daemon later fetches bot_token directly from
server using its daemon-scope JWT. No fleet↔server HTTP.

E2E verified: valid daemon JWT returns the right bot_token; missing/
non-owned bot returns 404 (no info leak).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…R-A.2)

Runtime/bot endpoints have moved to octo-fleet :8092. The module
still loads and runs schema migrations (no data loss), but Route()
returns early so nginx-style 404s come back from server.

Set LEGACY_RUNTIME_ROUTES=true to re-enable as a rollback path if
fleet has problems during cutover. Code path is unchanged — just
gated.

Verified:
- POST /v1/daemon/register → 404 (no route)
- GET  /v1/runtimes        → 404 (no route)
- POST /v1/auth/token      → still works (auth_jwt module)
- POST /v1/bot/mint        → still works (auth_jwt module)
- GET  /v1/bot/:uid/token  → still works (auth_jwt module)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…PR-C)

Spec PR-C: server's runtime/bot orchestration code is dead weight now
that octo-fleet has been stable through PR-A and PR-B. Server keeps
only the cross-service contract surface (modules/auth_jwt):
  - /.well-known/jwks.json   (JWKS for fleet/matter to verify tokens)
  - POST /v1/auth/token      (session/api_key → JWT exchange)
  - POST /v1/bot/mint        (web-callable bot OBO)
  - GET  /v1/bot/:uid/token  (daemon-callable bot_token lookup)

Cleanup steps already taken locally (one-time, not in commit):
  DELETE FROM gorp_migrations WHERE id LIKE 'runtime-%';
  -- the runtime_* tables themselves are harmless orphans and can be
  -- dropped manually whenever convenient; they don't affect anything.

Boot smoke:
- AuthJWT loaded key from disk (existing /Users/caster/.octo-server/jwt-priv.pem)
- /v1/runtimes → 404 (gone)
- /v1/bot/mint → 200 (new bot minted: 27w5OWCvvD2f63623a6_bot)
- /v1/auth/token (api_key) → 200 JWT

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
100 commits 主要内容:
- pkg/i18n 大规模迁移(modules/{user,message,group,workplace,category} → ResponseErrorL)
- oidc RP-Initiated Logout (#217)
- voice → octo-speech adapter 替换 (#113)
- modules/bot_api / modules/user 重构
- 各种 CI workflow

冲突点预期:internal/modules.go (我们删了 runtime import,origin 可能新增 module)

# Conflicts:
#	internal/modules.go
requireDaemonJWT only did signature verification via parsed.Claims;
expired tokens still mint bot_tokens via GET /v1/bot/:uid/token. Plan
§2 AU1 explicitly requires "JWT 过期 / kid 未知 / 签名失败 → 401" but
this code path enforced only the kid + signature halves.

Risk: a leaked daemon JWT past its 30-day TTL keeps working for any
bot the original owner had access to, since exp was never checked.
30-day window limits blast radius but is still a contract drift on
the most security-sensitive endpoint we expose.

Fix: chain cl.Validate(jwt.Expected{Issuer: jwtIssuer, Time: now})
after the signature check. Issuer pinning also blocks tokens minted
by a future second issuer (e.g. test fixtures, staging keys).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
917ff7c accidentally git-add'd a deletion of the /daemon BotFather
command (45 lines across botfather/{api,command,const}.go). The
deletion came from a stale working-tree state during merge resolution,
not from any deliberate change — and the command is the ONLY user-
facing onboarding path for the daemon:

  user sends /daemon in OCTO IM →
  BotFather generates (or reuses) user_api_key row in test DB →
  BotFather replies with a copy-paste install snippet →
  user pastes into their terminal → daemon starts + registers.

Without this, new users have no way to get an api_key and no way to
know the install URL pattern.

Restored verbatim from origin/feat/agent-runtime, plus a follow-up
update to the reply payload reflecting the PR-A/B 3-URL split:

  export OCTO_SERVER_URL=...
  export OCTO_FLEET_URL=...
  export OCTO_MATTER_URL=...
  export NOTIFY_INTERNAL_TOKEN=<ops-shared-secret>
  octo-daemon start --api-key uk_xxx --api-url <server>

New helper deriveServiceURL() best-effort swaps the port suffix on
server URL to compute fleet/matter URLs for the same host. Operators
running fleet/matter on different hosts must override via env after
copy-paste (call this out in the reply text).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the i18n merge, user/login writes sessions to Redis as
"v2:{"uid":"...","name":"..."}" but auth_jwt.resolveSession only
parsed the legacy "uid@name@..." triple-at format. Result: every
POST /v1/auth/token from the web returned 401 "malformed session
value", which the frontend treats as session-expired and bounces
the user back to the login page — so as soon as web tried to call
any fleet/matter endpoint requiring a JWT, the user got kicked.

resolveSession now detects the "v2:" prefix and JSON-unmarshals
the payload to extract uid; legacy format remains a working
fallback so nothing else breaks.

Independent of the DualAuth work — this bug was latent since the
i18n merge and surfaced when we ran the web through the new
Runtimes/智能体 tab which exercises the session→JWT exchange.
/daemon dispatch case + handler were registered correctly and the
command works when typed, but handleHelp's hardcoded command list
omitted it, leading users to think it had been removed. Adds the
line right after /quickstart since both are runtime-setup commands.

No behavioral change beyond what /help renders.
Two reviewer follow-ups bundled (both single-spot, low-risk):

resolve.go: replace the hand-rolled "v2:" + struct-unmarshal mini
decoder with pkg/auth.Decode. The pkg helper has been the canonical
v2 envelope reader since the i18n rewrite (commit 76bc59a); having a
parallel mini-decoder here meant any new v2 field would have to be
mirrored in two places. resolveSession now just calls Decode and
returns info.UID. Behavior unchanged — Decode already covers v2 JSON,
legacy "uid@name[@ROLE]" fallback, and empty-payload error mapping.

bot_api.go: cl.Validate(jwt.Expected{Issuer, Time: time.Now()}) was
silently relying on go-jose's DefaultLeeway (1 min) for daemon ↔ server
clock skew tolerance on exp/iat/nbf. Added a comment so the next
person who reads this knows not to wrap a tighter leeway around it
"for security" — that would cause spurious failures at mint time
when daemon clock drifts a few seconds ahead.
Closes 2 cross-space privilege escalations found while documenting the
auth flow:

#1 (POST /v1/auth/token): resolveSession used to trust caller-supplied
   X-Space-Id with a comment saying "downstream re-checks". In practice
   plan AU3 has downstream (fleet/matter) trust JWT.space_id directly.
   Neither side actually checked → any logged-in user could mint a JWT
   for an arbitrary space and gain cross-space read/write on fleet/
   matter endpoints. Now resolveSession calls assertSpaceMember on the
   spaceHint before returning.

#2 (POST /v1/bot/mint): mintBot took space_id from body, never
   validated caller is in that space. MintBotOBO would then drop a bot
   into space_member of an arbitrary space. Attack: pull the new bot
   into a group → observe messages of a space you don't belong to.
   Now mintBot calls assertSpaceMember(uid, req.SpaceID) → 403 if not
   a member.

Refactor: extracted the existing space_member SQL from resolveAPIKey
into a shared assertSpaceMember helper so the daemon path (which always
had this check) and the two new check sites share one implementation.

No DB migration. No API contract change. X-Internal-Token and daemon
api_key paths unaffected — they already validated this. Verified via
curl: legit space → 200/200; fake space → 401/403.
## Summary
- \`daemonTokenTTL\` 30 * 24h → 24h
- 同步 bot_api.go 里过时的 "30-day TTL" 注释 (codex review MINOR)

## Why
通信协议决策 P1-5. daemon 端 EnsureJWT 早就在过期前自动 refresh, 缩 TTL 没有运行时影响, 但把
api_key 撤销窗口从 30 天降到 1 天 (生产合规).

## Note
- 如果 auth session 决定走 "去 daemon JWT 改 api_key 直连" 路线 (review 结论 4),
这个改动就作废. 但当前 JWT 路线下这是安全的紧化, 两种方向都不冲突.

## Test plan
- [x] \`go build ./...\` 通过

## Reviewers
@lvsijia @lijianhui

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## ⚠️ Reviewer 请先读架构 doc, 再 review 代码

避免直接 nitpick 实现细节而错过架构层决策的 context (e.g. 为啥 AU5 删了但 issue #75 没修).

## 1. 背景 + Phase 拆解

合并 plan: **决策一** (web 验证改造 — 浏览器改回带 session token, 不再换 JWT) + **决策二**
(daemon api_key 直连 — daemon 不再换 JWT, 直接持 api_key 调 fleet/matter, 由它们调
server verify-api-key 验证).

5 个 Phase:

| Phase | 内容 | 涉及 repo |
|---|---|---|
| 1 | server `POST /v1/auth/verify-api-key` endpoint | server |
| 2 | matter middleware 扩三协议 (session+bot_token+api_key) + fleet 抄 |
matter + fleet |
| 3A | web 客户端切回 session token | web |
| 3B | daemon 改 api_key 直连 + 业务 SQL 加 owner_uid 过滤 | daemon-cli + fleet
+ matter |
| 4 | 清理 (JWT/AU5/DualAuth/JWKS) + 配置改名 + 单测补 | 全 5 repo |

会议依据: 鉴权架构讨论会议结论 2026-06-05 决策一+决策二.

## 2. 本 repo 改动 (3 commit, 按时间顺序)

```
6b0dac8 refactor(auth): rename auth_jwt → bot_provision + add status edge case
36fbc3e refactor(auth): tear down JWT module, botToken now uses api_key
c2cf1d0 feat(auth): add POST /v1/auth/verify-api-key endpoint
```

## 3. 配套 PR (cross-link)

- octo-server (本 PR)
- octo-matter: Mininglamp-OSS/octo-matter#78
- octo-fleet: Mininglamp-OSS/octo-fleet#24
- octo-daemon-cli: Mininglamp-OSS/octo-daemon-cli#26
- octo-web: Mininglamp-OSS/octo-web#294

合并顺序建议: **本 PR (server) 先 merge** — 它是 verify-api-key endpoint 的 base,
matter/fleet middleware 依赖它.

## 4. 本 repo review 重点

### Phase 1: `POST /v1/auth/verify-api-key` (modules/user/api.go)

- 跟现有 `/v1/auth/verify` + `/v1/auth/verify-bot` 同 group, 复用 verifyLimit
- SQL 两步走: `user_api_key WHERE api_key=? AND space_id!=''` (显式拒 legacy)
+ `space_member WHERE status=1` 校验 owner 还在 space
- 失败统一返 401 `{"msg":"invalid api_key"}` (不细分 revoked/not-found)
- 7 单测覆盖: valid / unknown / owner-left-space / nonactive-status /
legacy-empty-space / missing-field / multi-space

### Phase 4: `modules/auth_jwt` 模块清理 → 改名 `bot_provision`

- 删 JWT 签发 + JWKS endpoint + token exchange + Claims 全套
- `bot_api.go` 的 `botToken` handler 改成验 api_key (复用 `resolveAPIKey`),
不再验 daemon JWT
- 模块改名 `auth_jwt` → `bot_provision` (内容只剩 mintBot + botToken, 名字误导)

## 5. 重要决策 — issue #75 推迟到决策四

(本 PR 不涉及, 详情见 matter PR)

## 6. e2e 验证

- 本地 5 服务跑通: daemon 心跳 + web runtime 列表 + create bot + bot 私聊响应
- 单测 (核弹 test DB 后): `go test -run TestAuthVerifyAPIKey ./modules/user/`
→ 7/7 PASS

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary

- 4 件 v3.3.6 §7 follow-up nit (empty Bearer guard / dead code 删 / 测试
dedup / 注释精确化)
- 顺手删 JWT-era 410 Gone stub (`/.well-known/jwks.json` +
`/v1/auth/token`) + helper + 2 个 test — Phase 4 删 JWT 后 PoC 阶段无真 caller,
compat for nonexistent client 是 dead-code-by-construction
- **0 SQL 改动** — 不动 ban 语义, 保持 runtime 分支跟 main 一致 ("ban user ≠ ban
user's bots")
- **不影响 main** — 5 处全在 runtime-only 文件 (bot_provision 模块 /
verify-api-key test / zz file 都是 runtime 分支引入的)

净 diff: 5 files, +18 / -117 (净 **-99 行**)

## Scope explicitly excluded (留独立 PR / 决策)

- **nit-8 account-ban gate**: R0-R3 review 期间扩到 6 处 SQL D11 闭环, 但跟 main
"ban 不牵连 bot" 语义冲突, caster 拍 revert
- **nit-4 JWT-shaped Bearer hint**: 同 410 stub 同形, 假设不存在 client, revert
- nit-5 / nit-7 / nit-9 / verifyCache sha256: 独立 PR

## Review trail

6 轮 codex + cc 并行 review (R1-R6), 最后双 pass 0 finding:
- R1-R3: D11 SQL 扩 1→6 处 (双确认)
- caster review: 拍板回 main 语义, revert SQL
- R4: revert 后双 pass
- caster 进一步原则: no compat for nonexistent JWT clients
- R5-R6: 删 nit-4 + 410 stub, 双 pass

## Test plan

- [x] `go build ./...` PASS
- [x] `go vet ./modules/user/... ./modules/bot_provision/...` PASS
- [x] `go test -count=1 -p 1 -run 'TestAuthVerify|TestBotToken'
./modules/user/... ./modules/bot_provision/...` PASS
- [x] `go test -race -count=1 -p 1` (same scope) PASS
- [ ] reviewer 跑 `go build ./... && go vet ./...` 确认绿
- [ ] reviewer grep verify: `gone410Handler` / `TestGone410` /
`"/.well-known/jwks.json"` / `"/v1/auth/token"` 在 bot_provision/ 应 0 hit

## Notes

- 4 个 `TestDestroyApply_*` 失败 (full-package run) 是 **pre-existing
flaky** (clean origin/feat/agent-runtime 也挂; redis login-locked state 跨
test 残留), 跟本 PR 无关
- v3.3.6 §P1 (commit 07ca7db, 已 merged 的 daemon api_key user.status=1
gate) 是 runtime 分支跟 main 在 ban 语义上唯一遗留分叉, 若想 100% 跟 main 对齐需要单独 revert
PR

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mand

## Summary

Migrate runtime daemon onboarding from the BotFather IM `/daemon` command
to a web HTTP endpoint. The `/runtimes` page's CreateRuntimeModal will
call this endpoint to render the install + start commands directly in
the browser, rather than asking users to switch to IM and message
BotFather.

## Changes

### Add: GET /v1/runtime-onboarding (session auth)
- New file `modules/botfather/api_runtime_onboarding.go` (~150 lines)
- Route registered in `Route()` under a session-auth group
- Returns JSON containing:
  - `api_key` (lazy-create on first call, mirrors prior /daemon
    behavior — same `(uid, space_id)` row, same uk_* prefix)
  - `space_id`, `server_url`, `fleet_url`, `matter_url`
  - `commands.install` / `commands.start` — pre-rendered command
    strings for direct display
  - `env_vars` map — for users who want to set OCTO_*_URL in their
    shell rc instead of inlining per-invocation
- `space_id` resolution: X-Space-Id header > ?space_id= query > 400.
  No implicit "first space" fallback — would let caller end up with
  api_key bound to a space they didn't intend.
- `getOrCreateUserAPIKey` helper extracted from the now-deleted
  handleDaemon. INSERT race tolerant (unique constraint on
  (uid, space_id) — loser re-reads).
- `deriveOnboardingURLs` helper extracted from same.

### Remove: BotFather /daemon command
Following the same principle as PR #334 (no compat for nonexistent
clients post-Phase 4), removing rather than deprecating:
- `modules/botfather/const.go:46` — `CmdDaemon` constant
- `modules/botfather/api.go:308` — command list entry
- `modules/botfather/command.go:174-175` — `case CmdDaemon` dispatch
- `modules/botfather/command.go:489` — help text line
- `modules/botfather/command.go:1190-1258` — `handleDaemon` function
  (70 lines, behavior reproduced in the web handler)

`deriveServiceURL` (still in command.go) is preserved — both the new
web handler's `deriveOnboardingURLs` and any future caller share it.

## Verification
- `go build ./...` PASS
- `go vet ./modules/botfather/...` PASS
- Pre-existing test flake unrelated to this PR: `botfather` package
  test setup panics with `unknown migration: 20201222000001_report_
  legacy01.sql` from a polluted local test DB. Verified on clean
  `origin/feat/agent-runtime` — same panic, not introduced here.
  Fixable via `bootstrap.sh --fresh` (drops + re-creates test DB).

## Out of scope (separate PR)

octo-web `/runtimes` page restructure:
- Sidebar `Runtimes` → `运行时`
- Top-level Runtime/Bot tabs removed; tree view: device → runtime → bot
- Top-right `+` popover: 创建 Runtime / 创建 Bot
- New `CreateRuntimeModal.tsx` calls this endpoint

Frontend PR follows once this server PR is merged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cc round 1 verdict=fail 7 finding 2 P0 / codex round 1 verdict=fail 3 finding
2 P0. 双方共识 P0 + 单方 codex P0 全修.

- C1 (双方 P0): 缺 SpaceMember + user.status=1 校验. 接 spacepkg.CheckMembership
  (已 join space.status=1 + space_member.status=1) + 二重 user.status=1 SQL
  gate (D11 撤销链路审计: liftBanUser 走 redis token 已堵 session 路径,
  这里 belt-and-braces 防 banned user lazy-create dead api_key row).
  query 优先 / header 备选 (X-Space-Id / X-Space-ID 都吃) 跟 SpaceMiddleware
  顺序对齐.
- C2 (codex P0): commands.start 是 standalone multi-line block (含
  export OCTO_SERVER_URL + OCTO_FLEET_URL + OCTO_MATTER_URL + octo-daemon
  start 行), caller 复制即跑. env_vars 字段保留供想 set 到 shell rc 而
  非 inline 的场景.
- C3 (codex P0): QUICKSTART.md 删 BotFather /daemon 引导, 改成 Web
  Runtimes 页 + popover 创建 Runtime.
- C4 (cc): deriveOnboardingURLs 拼出 'http://:8090' (External.BaseURL +
  External.IP 都空) 时 handler 报 500 + log, 不再给前端展示 broken
  command.

cc 提的 minor 推 PR-3 (errcode 风格 / 单测). cc 提的 X-Space-Id 大小写 /
header-query 顺序由本 PR 自带兼容消化 (双 case 都吃, 跟 SpaceMiddleware
对齐).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cc round 2 verdict=pass / codex round 2 verdict=pass, 0 pre-push blocker.
顺手清 cc 提的 3 个 minor 提高 PR review 通过率.

- F1 (cc): 删第二个 c.GetHeader('X-Space-ID') dead branch — Go HTTP
  header 大小写不敏感 (textproto.CanonicalMIMEHeaderKey), 一次查询
  X-Space-Id / X-Space-ID 都吃, 第二次永远不会比第一次多查到. 改注释
  显式说明.
- F2 (cc): 删 bf.ctx.DB().NewSession(nil) 冗余调用, 改成 bf.ctx.DB()
  跟 SpaceMiddleware (pkg/space/middleware.go:102) 调用风格对齐.
- F4 (cc 提的 question): octo-daemon start --api-url 改回 serverURL
  跟旧 handleDaemon 一致. daemon-cli 内部 cfg.APIURL 只是 env (OCTO_FLEET_URL/
  OCTO_SERVER_URL) 缺失时的 fallback, 我们 commands.start 已经全设
  了 export, 实际 cfg.APIURL 不会被读到, 改 fleetURL/serverURL 行为
  等价 — 但语义上跟旧版对齐避免 future drift. 注释改写.

cc 的 F3 (CheckMembership + user.status 拆两次 vs verify-api-key 一条
SQL JOIN sm+s+u 简洁版) / F6 (broken URL 检测覆盖度盲区) 推 PR-3 (cosmetic).
codex 的 C4 (broken URL 漏 'http://' / 空白 IP case) 同 F6 推 PR-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ndpoint

feat(runtime-onboarding): web endpoint 替代 BotFather /daemon 命令
daemon 0.0.3 二进制不读 OCTO_MATTER_URL (它不直接调 matter), 且 matter
当前未上线; 启动命令只下发 OCTO_SERVER_URL + OCTO_FLEET_URL 两个 export.
顶层 matter_url 响应字段保留, 不影响契约.
…362)

The /daemon onboarding flow (now GET /v1/runtime-onboarding) used to
instruct users to `export OCTO_MATTER_URL`, but the octo-daemon binary
never reads that env — it reaches matter through the fleet/server
endpoints. The instruction was a no-op that caused bot-provisioning
friction.

The runnable command block and env_vars already omit OCTO_MATTER_URL,
but nothing guarded it against reintroduction. Extract the command/env
builders into pure functions (buildDaemonStartBlock, daemonEnvVars) and
add hermetic regression tests asserting the daemon-facing instructions
expose exactly OCTO_SERVER_URL + OCTO_FLEET_URL and never
OCTO_MATTER_URL.

No behavior change; response shape unchanged.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@lml2468 lml2468 requested a review from a team as a code owner June 17, 2026 15:49
@github-actions github-actions Bot added the size/M PR size: M label Jun 17, 2026
mochashanyao
mochashanyao previously approved these changes Jun 17, 2026

@mochashanyao mochashanyao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Octo-Q · automated review]

Verdict: Approve — no blocking findings; notes below (data-flow traced).


octo-server PR#405 Review Report

Reviewer: Octo-Q (automated review)
Head SHA: c0c207016db729d61db36fc7b24186b12d93098c
PR: test(botfather): guard daemon onboarding never emits OCTO_MATTER_URL (#362)

1. Verification Summary

Item Status Evidence
Refactor preserves behavior buildDaemonStartBlock uses identical fmt.Sprintf template + same arg order as removed inline block (api_runtime_onboarding.go L168-172 vs base L148-152)
daemonEnvVars output parity Returns same 2-key map {OCTO_SERVER_URL, OCTO_FLEET_URL} as removed inline map (L179-182 vs base L159-162)
daemonInstallCommand const parity Same string literal "npm install -g @mininglamp-oss/octo-daemon" (L155 vs base L155)
Response struct unchanged runtimeOnboardingResp shape untouched; MatterURL still returned as informational field
Regression tests correct Both tests call pure functions directly, assert absence of OCTO_MATTER_URL and presence of the two consumed URLs
No new callers / side effects buildDaemonStartBlock and daemonEnvVars are only called from runtimeOnboarding() — single call-site each

2. Findings

No P0/P1/P2 findings.

This is a pure extract-and-test refactor. Every data path is character-for-character identical to the base branch:

  • fmt.Sprintf template: "export OCTO_SERVER_URL=%s\nexport OCTO_FLEET_URL=%s\nocto-daemon start --api-key %s --api-url %s" — unchanged
  • Arguments: serverURL, fleetURL, apiKey, serverURL — unchanged order
  • Env var map: same 2 keys, same values
  • Install string: same literal

3. Data-Flow Trace

Consumed data Upstream source Flows correctly?
serverURL in start block deriveOnboardingURLs()cfg.External.BaseURL or http://{IP}:8090, trimmed of /api and / suffixes
fleetURL in start block + env map deriveServiceURL(server, ":8092") — swaps port suffix on server URL
apiKey in start block getOrCreateUserAPIKey(uid, spaceID)apiKeyService.GetOrCreate()
matterURL in response JSON deriveServiceURL(server, ":8080") — still computed, still in MatterURL field, just not in commands/env_vars ✅ (intentional: informational only)

4. Test Quality

Both tests are hermetic (no HTTP setup, no DB) and directly exercise the extracted pure functions:

  • TestBuildDaemonStartBlock_OmitsMatterURL: asserts no OCTO_MATTER_URL, asserts both required env names + values, asserts --api-key flag
  • TestDaemonEnvVars_OmitsMatterURL: asserts no OCTO_MATTER_URL key, asserts exact 2-key count, asserts correct values

The len(env) != 2 guard is particularly good — it prevents someone from adding OCTO_MATTER_URL back alongside the existing keys without the test failing.

5. Suggestions (non-blocking)

None. The refactor is clean, the doc comments carry the "why no MATTER_URL" rationale, and the tests lock the contract.

6. R5 Blind-Spot Checklist

  • C1 (dual-path parity): N/A — no symmetric add/remove paths touched
  • C2 (control-flow ordering / nesting): N/A — pure functions with no side effects, single call-site
  • C3 (auth boundary ≠ capability): N/A — no auth/permission changes
  • C4 (auth lifecycle / container cascade): N/A — no auth changes

7. Cross-Round Blocker Recheck

N/A — first review of this PR.


[Octo-Q] verdict: APPROVE + Pure extract-and-test refactor. Behavior is character-for-character identical to base. Regression tests correctly guard the #362 contract (no OCTO_MATTER_URL in daemon-facing instructions). No findings at any severity.

yujiawei
yujiawei previously approved these changes Jun 17, 2026

@yujiawei yujiawei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #405 (octo-server)

Verdict: APPROVED. This is a clean, behavior-preserving refactor plus a regression guard for #362. Build, vet, and the new tests all pass at head c0c2070. No blocking issues; a few non-blocking suggestions to harden the guard are listed below.

1. Verification

  • Behavior-preserving refactor. The inline startBlock fmt.Sprintf and env_vars map were hoisted into buildDaemonStartBlock / daemonEnvVars / daemonInstallCommand. Compared byte-for-byte against the deleted inline code: the format string "export OCTO_SERVER_URL=%s\nexport OCTO_FLEET_URL=%s\nocto-daemon start --api-key %s --api-url %s", all four %s verbs, the argument order (serverURL, fleetURL, apiKey, serverURL), the --api-url value, the install string, and the env-var map keys/values are identical (api_runtime_onboarding.go:171-184). Call sites at :145-148 feed the same upstream variables; runtimeOnboardingResp shape is unchanged. The "No behavior change" claim holds.
  • #362 contract enforced. Start block and env_vars expose exactly OCTO_SERVER_URL + OCTO_FLEET_URL and never OCTO_MATTER_URL. A repo-wide grep confirms there is no other emitter of OCTO_MATTER_URL in any daemon-facing instruction — the only references are the deliberate-exclusion comments and the test.
  • Tests pass. go build ./modules/botfather/ (ok), go vet ./modules/botfather/ (clean), go test -run OmitsMatterURL -v → both new tests PASS.
  • Scope. Exactly 2 files, +104/−20, single commit on top of feat/agent-runtime. No scope creep.

2. Issues

None at P0/P1. The refactor is sound and the contract is correctly guarded.

3. Suggestions (non-blocking)

  • (P2) The guard tests the builders, not the wire. Both tests call buildDaemonStartBlock / daemonEnvVars directly; nothing exercises the runtimeOnboarding HTTP handler or asserts on the serialized JSON. Today this transitively covers the response because the handler calls exactly these builders, but it is a by-convention rather than structural guarantee — a future divergent code path could leak OCTO_MATTER_URL on the wire while both unit tests stay green. A single httptest-level assertion that the serialized response body contains no OCTO_MATTER_URL would make the guard leak-proof. Reasonable to defer.
  • (P2) Asymmetric strictness between the two tests. TestDaemonEnvVars_OmitsMatterURL asserts len(env) == 2 (a true closed-set check), but TestBuildDaemonStartBlock_OmitsMatterURL only checks "contains SERVER+FLEET, not MATTER" — an additive regression (e.g. a third unrelated export) in the start block would pass. Consider asserting the block contains exactly the two expected export lines to match the "expose exactly" intent.
  • (P2 / informational) matter_url JSON field is intentionally retained. The top-level MatterURL field (:34, json matter_url, populated at :143) is still shipped as an informational display value and is correctly out of scope for #362, which bans only the daemon-facing OCTO_MATTER_URL env instruction. This distinction (banned env vs. allowed informational field) is load-bearing and currently documented only in comments — worth keeping in mind so a later "cleanup" neither removes the legitimate field nor "restores symmetry" by re-adding the env.

4. Additional findings

None. The change is well-scoped and the rationale for omitting OCTO_MATTER_URL is now durably documented on the builder doc comment, which is the right home for it.

Jerry-Xin
Jerry-Xin previously approved these changes Jun 17, 2026

@Jerry-Xin Jerry-Xin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is in scope for octo-server and cleanly guards the BotFather runtime onboarding daemon command contract without changing the response shape.

💬 Non-blocking

🔵 Suggestion — modules/botfather/api_runtime_onboarding_matter_url_test.go: The test comment says the start block must expose “exactly” OCTO_SERVER_URL and OCTO_FLEET_URL, but the assertion only checks presence plus absence of OCTO_MATTER_URL. Consider asserting the full expected start block or counting/parsing exported OCTO_ variables so future additions like OCTO_OTHER_URL are caught.

✅ Highlights

The helper extraction in api_runtime_onboarding.go keeps the response construction readable and gives the OCTO_MATTER_URL exclusion a clear single place to document.

The env_vars regression test correctly pins the exact two-key map contract.

Verification run: go test ./modules/botfather/ -run 'OmitsMatterURL' -count=1 passed.

@liuooo liuooo force-pushed the feat/agent-runtime branch from 89d30b6 to eacb489 Compare June 26, 2026 11:49
@lml2468 lml2468 added stage:review Review phase - QA/security/code review review:running:qa qa-engineer review in progress review:running:security security-engineer review in progress review:running:code code-reviewer review in progress labels Jun 27, 2026
@lml2468

lml2468 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

QA-Engineer Verdict: CHANGES_REQUESTED

Strengths

  • Comprehensive test coverage for bot_provision (9 cases for botToken, 15+ for verify-api-key)
  • Security regression tests (disabled space, account ban, revoked key, cross-space)
  • BC tests (default schema preserved without ?include=context)
  • Fail-secure tests (DB error on context query → empty map, not 500)
  • Source guard tests (no legacy error responses, zh-CN parity)

Issues

  1. No test for mintBot happy path — POST /v1/bot/mint has no dedicated test. bot_api_test.go only covers botToken (GET /v1/bot/:uid/token). mintBot is web-session-auth and needs a test with a mocked session context.
  2. No test for deriveServiceURL or deriveOnboardingURLs — URL derivation has edge cases (IPv6, URLs without port, empty External config) with no test coverage.
  3. PR scope mismatch — Title says "test(botfather): guard daemon onboarding" but diff adds 2839 lines including a whole new module (bot_provision), 2 new endpoints (mint + verify-api-key), and major user module changes. Should be split or retitled.
  4. Test formattingTestAuthVerifyAPIKey_IncludeContext_DBError_FailSecure has misaligned indentation in the RENAME/defer block.

@lml2468 lml2468 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA-Engineer (multica loop): CHANGES_REQUESTED. See detailed verdict comment above. Missing mintBot test, no URL derivation tests, PR scope mismatch.

@lml2468 lml2468 added review:done:qa:changes qa-engineer FAIL and removed review:running:qa qa-engineer review in progress labels Jun 27, 2026
@lml2468

lml2468 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Security-Engineer Verdict: APPROVE (with comments)

Security Posture: STRONG

This PR has gone through multiple security review iterations and the results show.

Positive Findings

  1. Space membership enforcementassertSpaceMember correctly joins space (s.status=1) + space_member (sm.status=1) + user (u.status=1). This closes the soft-deleted-space bypass, the account-ban bypass, and the lingering-membership bypass.
  2. Cross-space attack preventionbotToken joins space_member with callerSpace, preventing api_key from SpaceB minting tokens for bots in SpaceA.
  3. Anti-enumeration — Uniform 404 for cross-space/disabled/nonexistent bot prevents existence probing.
  4. Client ID filteringclient_id='botfather' gate prevents integration-client keys from authenticating daemon paths.
  5. Status=1 filters — Applied to user_api_key (revoked keys), robot (disabled bots), space (disabled spaces), space_member (inactive memberships), and user (banned accounts). Comprehensive.
  6. Rate limiting/v1/bot/:uid/token shares the 1000 req/min/IP "verify" bucket with verify-* endpoints.
  7. Anti-enumeration on verify-api-key — Same error code for unknown key, owner-left-space, and disabled-space.

Concerns

  1. 🔴 configs/tsdd.yaml has hardcoded MinIO credentialsaccessKeyID: "admin" + secretAccessKey: "12345678". Even with the comment "local dev only", committed creds in a public repo get scraped by bots. Resolution: remove from the PR or use env var placeholders.
  2. 🟡 configs/tsdd.yaml external.ip: "localhost" uncommented — dev-local config that should remain commented for the default template.
  3. 🟢 bot_token transmitted in response body — By design, but ensure HTTPS enforcement in deployment. Not a PR-level issue.
  4. 🟢 generateBfToken uses crypto/rand — Correct choice. 16 bytes = 128-bit entropy, sufficient for bot tokens.
  5. 🟢 No JWT/RSA dead code — JWT was removed (Phase 4), no leftover key material.

SBOM/Dependency Diff

  • go.mod changes: adds new module dependencies for bot_provision. No obvious supply-chain risk from the visible diff.

Verdict

APPROVE — security posture is strong. The one 🔴 (hardcoded credentials) should be addressed before merge but does not affect the application code paths being reviewed.

@lml2468 lml2468 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security-Engineer (multica loop): APPROVE (with comments). Strong security posture. Hardcoded MinIO credentials in configs/tsdd.yaml should be addressed before merge.

@lml2468 lml2468 added review:done:security:comment security-engineer CLEARED-WITH-RISK and removed review:running:security security-engineer review in progress labels Jun 27, 2026
@lml2468

lml2468 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Code-Reviewer Verdict: APPROVE (with comments)

Code Quality: GOOD

The codebase is well-structured, thoroughly documented, and shows careful iterative security hardening. Comments explain the "why" at a high level.

Positive Findings

  1. Clean module separationbot_provision is self-contained with clear boundaries (bot_api.go, resolve.go, jwt.go, rand.go). The indirection via readRand/hexEncode vars enables unit testing.
  2. Well-factored error handling — Uses httperr.ResponseErrorL / ResponseErrorLWithStatus consistently, with proper errcode registration.
  3. Comprehensive SQL — All queries use parameterized statements (no SQL injection risk). JOINs are correctly structured.
  4. Good API design — POST /v1/bot/mint for web, GET /v1/bot/:uid/token for daemon. Auth model is clean (session vs api_key Bearer).
  5. Backward-compatible response shapesauthVerifyTokenResp preserves default schema; ?include=context is opt-in.
  6. MintBotOBO correctly handles idempotencycreateBotCoreWithRetry + INSERT IGNORE for space_member.

Concerns

  1. Scope creep — PR title says "test(botfather): guard daemon onboarding" but adds 2839 lines including a whole new module, 2 endpoints, and major user module changes. This should be retitled at minimum, or split into separate PRs. The fix(botfather): /daemon instructs OCTO_MATTER_URL, but the daemon never reads it #362 guard is ~100 lines; the rest is PR-A fleet-split work.
  2. MintBotOBO logged-and-continue failuresspace_member INSERT and AddFriend calls log warnings on failure but proceed. This means the bot could exist without being in the space or having a friend relationship with the owner. Consider at minimum returning a partial-success indicator.
  3. deriveServiceURL fragility — Byte-walking backwards to find : then replacing the suffix works for host:port but breaks for IPv6 ([::1]:8090) or URLs with : in the path. Document the limitation.
  4. runtimeOnboarding uses legacy error responsesc.ResponseErrorWithStatus instead of httperr.ResponseErrorLWithStatus. The bot_provision module correctly uses the new envelope, but this endpoint uses the old pattern. Consider harmonizing.
  5. Test indentationTestAuthVerifyAPIKey_IncludeContext_DBError_FailSecure has misaligned code in the RENAME/defer block.

@lml2468 lml2468 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-Reviewer (multica loop): APPROVE (with comments). Well-structured code. Scope creep (title mismatch) and MintBotOBO partial-success pattern are the main concerns.

@lml2468 lml2468 added review:done:code:comment code-reviewer APPROVE-WITH-NITS and removed review:running:code code-reviewer review in progress labels Jun 27, 2026
@lml2468

lml2468 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Aggregate Verdict: CHANGES_REQUESTED

Verdict Summary

Reviewer Verdict Status
qa-engineer CHANGES_REQUESTED ❌ Changes required
security-engineer APPROVE (with comments) ✅ Cleared with risk notes
code-reviewer APPROVE (with comments) ✅ Approved with nits

Rule Applied

QA-engineer returned changes → per aggregation rule: any changes verdict → CHANGES_REQUESTED.
Security and code reviewers cleared with comments but their approvals cannot override a changes verdict.

Action Required

PR author must address QA-engineer's change requests before this PR can proceed to human merge review.

@lml2468 lml2468 added the review:complete 3 verdicts aggregated, awaiting human merge label Jun 27, 2026
@superalsrk superalsrk force-pushed the feat/agent-runtime branch from eacb489 to ccdf219 Compare June 27, 2026 13:31
Base automatically changed from feat/agent-runtime to main June 27, 2026 17:07
@superalsrk superalsrk dismissed stale reviews from Jerry-Xin, yujiawei, and mochashanyao June 27, 2026 17:07

The base branch was changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review:complete 3 verdicts aggregated, awaiting human merge review:done:code:comment code-reviewer APPROVE-WITH-NITS review:done:qa:changes qa-engineer FAIL review:done:security:comment security-engineer CLEARED-WITH-RISK size/M PR size: M stage:review Review phase - QA/security/code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants