Skip to content

Commit c95bca7

Browse files
LyaQanYiHongzhi Wenclaude
authored
feat(activity): 跨平台 OS 信号 push 通道 + agent_router remote 守卫 (#1477)
* refactor(activity): 合并 remote-deploy helper + agent_router 加 remote 守卫 PR B 的安全收紧 + 重构步骤,为后续 /api/activity_signal 端点铺路。 ## 单源化 remote-deploy helper `NEKO_ACTIVITY_TRACKER_REMOTE` 检查原本有两份实现: - `main_logic/activity/system_signals._force_degraded_from_env`(活动采集器降级) - `main_routers/system_router._is_remote_backend_deployment`(截图端点拒绝) 逻辑一致但分两处维护,新增 consumer(agent_router、即将加的 activity_signal 推送端点)只会让漂移风险继续累积。重命名前者为公共 `is_remote_backend_deployment()`,private 别名保留向后兼容;system_router 改为 import + 模块级别名,screenshot router 测试无需改动。 ## agent_router 加 remote-mode 守卫 `/api/agent/{flags,command,admin/control,tasks/{id}/cancel}` 四个 mutation 端点之前没做远端部署感知 —— 远程部署时 computer_use/browser_use/openclaw 控制的是服务器机器而不是用户机器,转发给 localhost tool_server 既无意义又有安全风险(任何能调到公开后端 HTTP 的请求都能驱动服务器上的 agent)。 新增 `_remote_backend_block()` helper,在四个端点入口检查环境变量,命中就直接 501,和 `/api/screenshot` 同环境下的拒绝模式对齐。 ## scope 注记 issue #1023 原本审计提到的"agent_router 没套 CSRF + Origin 守卫"是更宽泛的洞 —— 防的是 DNS rebinding 这类对本地后端的攻击。完整修复需要前端 15+ 个 agent fetch 调用点开始送 `X-CSRF-Token`(参考 `static/app-screen.js` 的 `secureLocalScreenshotFetch` 模式),那是 ~200 行前端改动,超出 PR B 预算。本 commit 只覆盖远端部署威胁这一半,CSRF 部分留 follow-up issue。 ## 测试 - 现有 `tests/unit/test_system_screenshot_router.py` 35 个用例继续通过(私有别名保留) - 新增 `tests/unit/test_agent_router_remote_block.py` 52 个用例: * 4 端点 × 2 env 变量 × 5 truthy 值 = 40 个 blocked 路径 * 4 端点的 unset 路径不被误拦 * 6 个 truthy/falsy 边界 + helper 直接调用 + 跨模块同源验证 issue #1023 拍板路径中的 C1 步骤。 * feat(activity): 前端 push OS 信号端点 + Electron 心跳客户端 issue #1023 的主体功能。把 PR #1015 留下的 ``push_external_system_signal`` push 通道补成完整闭环 —— Electron 前端 5s 心跳读 OS 信号,POST 到后端, 覆盖 Win/Mac/Linux 桌面 + 远端服务器部署 + 移动 shell(受平台限制部分降级)。 ## 后端 POST /api/activity_signal 新端点接受 ``window_title`` / ``process_name`` / ``idle_seconds`` / ``cpu_avg_30s`` / ``gpu_utilization`` 五个 OS 信号字段,转发给对应 lanlan_name 的 tracker: - 字段全部可选(``lanlan_name`` 除外),缺啥跳啥,部分 snapshot 也优于无 push - 400 校验:range(idle≥0, cpu/gpu ∈ [0, 100])+ type - 404 当 lanlan_name 未注册 - 503 当 tracker 还没初始化(boot race) - 429 + ``Retry-After`` 当 5s 内重复(``_EXTERNAL_SIGNAL_MIN_INTERVAL`` 在 ``tracker.py`` 紧挨 TTL 常量,前后端节奏耦合,不进 config) - 500 当 tracker 抛异常 节流字典按 lanlan_name 分桶,上限 64 个(防恶意 spray,实际 1-3 个)。 ## 静态 JS 客户端(``static/app-activity-signal.js``) 5s 心跳模块,行为: 1. 检测 ``window.nekoActivitySignal.read`` 是否暴露(NEKO-PC sibling PR 加的 Electron 桥)。没有就 log 一次后退出 —— 纯浏览器 / 手机 shell / 没装 NEKO-PC 的 dev 跑都安全 2. 每个 tick 从桥拉 camelCase 信号,按类型/范围验证后转 snake_case POST 到 /api/activity_signal 3. 429 / 404 / 503 静默忽略(节流命中 + boot race,预期内);其他错误 计数到 3 次后停止 log 防 spam 4. ``visibilitychange`` 暂停/恢复,避免隐藏窗口浪费 IPC + 网络 5. 每 tick 重新解析 lanlan_name(角色切换后立即推到新 tracker) 只挂在 ``templates/index.html``(永远在的桌面 pet 窗口),不挂 chat.html 避免双心跳互相 throttle。 ## 配套 - ``tests/unit/test_activity_signal_router.py``:26 个用例 * happy path(全字段/单字段/strip 空白) * 校验 400(缺 lanlan_name、非对象 body、非法 JSON、各字段越界/类型错) * 404 / 503 / 500 * 节流 429(含 Retry-After 头)、独立 lanlan_name 桶、TTL 过后恢复 * 节流字典 cap 测试(spray 攻击下不无限增长) - ``docs/design/user-activity-tracker.md``:把"HTTP 端点 not yet added"那段改成完整契约 + 渲染端客户端 + Electron 桥契约 ## 跨仓库 桥的 Electron 半边在 NEKO-PC 仓库的 follow-up PR:``src/main.js`` 的 ``ipcMain.handle('neko:read-activity-signal', ...)`` + 对应 preload。 本 PR 的 ``app-activity-signal.js`` 已经设计成桥缺席时优雅 no-op,所以 两个 PR 谁先 merge 都不会破东西。 issue #1023 拍板路径中的 C2 步骤。 * fix(activity-signal): 拦 NaN/Inf + inFlight 竞争 + bridge 失败入节流 PR #1477 review 反馈两条修复,对应 CodeRabbit + Codex P2 / Minor。 F2(远端模式下端点匿名可写)需要前端 CSRF 接入,scope 超出 PR B,已在 原 PR 描述里说明,留 follow-up,本 commit 不动。 ## F1 NaN / ±Infinity 漏过验证(CodeRabbit + Codex 都点了) ``_activity_signal_validate_float`` 之前只做 ``< lo`` / ``> hi`` 比较, ``float('nan') < lo`` 静默 False → ``NaN`` / ``Infinity`` 全部能塞进 tracker。下游某些 JSON 序列化(``json.dumps`` 默认 ``allow_nan=False``, 日志 / 状态机回放、response 转出)会直接抛 ``ValueError`` 把上游 endpoint 拖成 500。 修:range 比较前加 ``math.isfinite`` 守卫,返回 ``"<field> must be finite"`` 的 400。 新增 9 个测试:三个字段(idle/cpu/gpu)× 三个 token(NaN / Infinity / -Infinity)。TestClient 的 ``json=`` 走 httpx 的 ``allow_nan=False`` serialiser 拦不下来,所以测试用 ``content=`` 直接送裸 bytes,对齐 攻击者 / buggy client 走 stdlib ``json.loads`` 的实际路径。 ## F3 inFlight 竞争 + bridge 失败节流(CodeRabbit) 两个问题: 1. 之前 ``if (inFlight) return`` 检查在 ``await readSignalsFromBridge()`` 之前,但 ``inFlight = true`` 在 await 之后。慢 IPC 时(Linux xprop / macOS Screen Recording 提示)两个 tick 都能过 inFlight 检查、并发 进入 fetch,无谓触发后端 5s rate limit。 2. ``readSignalsFromBridge`` catch 块 log 用了 ``consecutiveFailures < THRESHOLD`` 门控,但从来不 ``++`` 这个计数器。桥反复失败时每 5s 都会重复打同一条 warn,永不消音。 修: - ``inFlight = true`` 提到 try 块开头、桥读之前;finally 统一清, early return 也走 finally - ``readSignalsFromBridge`` catch 块先 ``consecutiveFailures++`` 再 按新计数 gate log;和 fetch-side 失败节流共享同一个 3-then-quiet 策略 手测验证:模拟桥反复抛异常 6 次,warning log 严格 ≤ 3 条。 ## 关联 - PR #1477 上 CodeRabbit / Codex inline comments - 不影响 issue #1023 的整体架构决策(CSRF 仍是 follow-up) * fix(activity-signal): Origin-present same-origin gate(CodeRabbit 建议) push_activity_signal 入口加 ~10 行 Origin/Referer 同源校验,零前端改动 就能挡掉浏览器侧 drive-by CSRF。F2 scope 切分内的折中加固,PR #1477 CodeRabbit 审审里给的方案,原 PR 描述里承诺的 follow-up 安全 PR 会做 完整 CSRF 守卫。 ## 决策点 browsers 自 2024 起对 POST 强制带 Origin header,所以"Origin 在 + 不在 allowed 集合" = 跨站浏览器 JS。捕这一类不需要 token: - ``curl`` / Node 脚本 / Electron 主进程:无 Origin → 放行(原契约) - 同源 Electron 渲染端 / 浏览器:Origin == ``request.base_url`` → 放行 - 恶意页面 (evil.com) 跨站 fetch:Origin == ``https://evil.com`` → 403 Referer 兜底覆盖少数没发 Origin 的客户端(``_get_request_origin`` helper 已经处理)。 ## 与现有截图端点的差异 ``/api/screenshot`` 用的是更严的 ``_validate_local_mutation_request`` (CSRF + Origin 双因子,必须前端送 ``X-CSRF-Token``)。本端点保持 "无强制 token"基线,理由是: 1. 完整 CSRF 需要前端 fetch 调用点统一改造,~200 行前端,超 PR B 预算 2. 影响面:本端点最坏只能伪造 tracker 软状态影响主动搭话内容选择, 不像截图能泄漏屏幕、agent 能驱动 computer_use 3. follow-up 安全 hardening PR 会一次性统一所有端点的 CSRF 策略 ## 测试 新增 7 个用例: - 无 Origin → 200(curl/Node 路径) - 同源 Origin → 200(Electron 渲染 / 同源浏览器) - evil.com / attacker.example.com / 子域伪装 → 403 - Referer-only off-origin → 403(fallback path) - 无法 parse 的 Origin → 200(fall through to no-Origin, 同 screenshot 相反方向的兼容) 总计 ``test_activity_signal_router.py`` 42 个用例通过(原 35 + 7 新)。 ## 关联 - PR #1477 上 CodeRabbit F2 thread (3292871371) - follow-up issue: CodeRabbit 答应代开统一 CSRF/Origin/token hardening PR * fix(activity-signal): resolveLanlanName 优先读 appState(Codex F4) Codex P2 on PR #1477:在角色切换的 lag 窗口里 ``window.lanlan_config`` 会暂时落后于 ``window.appState``。原来 ``resolveLanlanName`` 只读 ``lanlan_config``,切换瞬间 → 心跳推到旧 lanlan_name 上(被后端 404 或 落入老 tracker),新角色短暂丢 OS 信号覆盖。 同项目里 ``static/app-react-chat-window.js:~1442`` 已经是这个 fix pattern —— 注释写"角色切换时 appState 先更新,window.lanlan_config 可能 滞后",并标注 "CodeRabbit Major 指出"。本 PR 一致化处理。 ## 修改 ``resolveLanlanName`` 优先级链: 1. ``window.appState.lanlan_name``(切换时 first-update) 2. ``window.lanlan_config.lanlan_name``(兜底) 3. URL ``?lanlan_name=`` query param 4. 空字符串 → 当 tick 跳过 ## 测试 新增 vm 沙箱 5 case 手测验证: - 只有 appState → 用 appState - 只有 lanlan_config → 兜底 - 切换 lag(两者都在但不同)→ 用 appState(不打到旧 tracker) - URL 兜底 - 空 appState.lanlan_name 跳过、落回 lanlan_config * fix(activity-signal): 拦 Origin=null + 空 payload (Codex F5/F6) Codex P1/P2 on PR #1477 review。两条都关掉了原 Origin gate 之后还能渗 进 tracker 的旁路。 ## F5 (P1): Origin "null" 旁路 opaque-origin(沙盒 iframe、file://、扩展上下文)浏览器送的字面量 ``"null"`` 字符串。``urlsplit("null")`` 解析成空 scheme + netloc → ``_normalize_origin_value`` 返回 ``""`` → Origin gate 走 no-Origin "allowed" 分支 → 跨站攻击页可以靠 ``<iframe sandbox>`` 注入伪造心跳。 修:``push_activity_signal`` 入口加 raw 字符串守卫,``Origin`` 或 ``Referer`` 为 ``"null"``(大小写不敏感)直接 403。早于 normalize 调用, 不会被空字符串吃掉。 3 个新测试:Origin/Referer × ("null"/"NULL")。 ## F6 (P2): 空 payload 污染 tracker 状态 之前 client 只在 ``payload === null`` 时 skip,但桥返回 ``{}`` 或字段 全部 type/range 校验失败时 payload 会是 ``{}``。POST ``{"lanlan_name": "X"}`` 命中 ``push_external_system_signal`` 的硬编码默认值 (``idle_seconds=0.0`` / ``cpu_avg_30s=0.0``)+ ``os_signals_available= True``,silently 把真实状态盖成"idle=0/cpu=0/no window",主动搭话分类 被污染。 修两层: 1. 前端 ``static/app-activity-signal.js``:``Object.keys(payload).length === 0`` 时 skip tick;保留 HTTP roundtrip + rate-limit 配额 2. 后端 ``push_activity_signal``:所有信号字段都 None 时 400 ``"at least one signal field required"``;defence-in-depth 兼挡 native / 恶意 caller 8 个新测试: - ``test_lanlan_name_only_payload_rejected_400``(裸 lanlan_name → 400) - ``test_single_field_payload_accepted`` × 5(每个单字段都能 happy path) - ``test_opaque_origin_null_rejected`` × 3(F5) - 既有 11 个 happy-path 测试更新成带 ``idle_seconds: 0``(最小信号字段) 前端 4 个 vm 沙箱手测:桥返回 ``{}`` / null / 全字段非法 → 0 次 fetch; 返回 1 个有效字段 → 正常 POST。 ## 关联 - PR #1477 Codex review on commit 67c8089 - 单元测试 ``test_activity_signal_router.py`` 现 50 个用例全过 * fix(activity-signal): 把空白字符串视为缺失 (CodeRabbit F7) CodeRabbit Minor on PR #1477:F6 的空 payload 守卫只检查 ``None``, 但 ``{"lanlan_name":"X","window_title":""}`` / whitespace-only 字符串 能通过 str 验证器、绕过 ``all(None)`` 检查、然后让 tracker 记录"无前 台窗口 + 数值默认 0.0"的伪状态 —— 和 F6 要堵的空 payload 污染本质一 样。 修:守卫扩展为 ``v is None or (isinstance(v, str) and not v.strip())``, 覆盖 ``None`` + 空字符串 + 纯空白。 注:故意没在 ``_activity_signal_validate_str`` 里把空白 normalize 成 None。如果未来 tracker 想区分"看到桌面但没标题"(``""``)和"完全没观 察"(``None``),upstream 数据语义保留。守卫只在"全 payload 加起来 零信息"的边界判断上把它们等同对待。 ## 测试 新增 8 个用例: - 6 个 blank-only payload 被 400 (window_title=""/" "/process_name=""/ "\t\n "/双字段全空/双字段都空白) - 2 个 blank + signal 配对仍 200(空白单独不算信号,但配上 idle/cpu 整体仍有效) ``test_activity_signal_router.py`` 现 58 个用例全过。 * fix(activity-signal): float 验证器拦 bool (Codex F8) Codex P2 on PR #1477。``_activity_signal_validate_float`` 走 ``float(raw)`` 强转,而 ``bool`` 是 Python 里 ``int`` 的子类 —— >>> float(True) 1.0 >>> float(False) 0.0 所以 ``{"idle_seconds": true}`` / ``{"cpu_avg_30s": false}`` 之类的载荷 会被静默接受成 1.0 / 0.0,通过 range 检查,把伪造遥测写进 tracker。 ``idle_seconds=True`` → "用户刚操作",``cpu_avg_30s=True`` → "1% 占用", 都会偏移活动分类。 修:``raw`` 进 ``float()`` 前加 ``isinstance(raw, bool)`` 守卫,命中就 返回和"类型错"同样的 400 "<field> must be a number"。``isinstance`` 检查必须先于 ``float()`` 因为后者会 happy-path 吃 bool。 6 个新测试:3 字段 × (True / False),全部 400。 ``test_activity_signal_router.py`` 现 64 个用例全过。 * fix(activity-signal): float 验证器拦 OverflowError (Codex F9) Codex P2 on PR #1477。``float()`` 对 native Python big-int 会抛 ``OverflowError``: >>> float(10 ** 400) Traceback (most recent call last): ... OverflowError: int too large to convert to float 原 ``except (TypeError, ValueError)`` 漏了这种,请求直接成 500 而不是 正常 400 validation 错。JSON 规范不限整数精度,Starlette 的 ``json.loads`` 会照实给我们 big-int → 任何人 POST oversized 整数都能 低成本把 endpoint 弄 500。 修:``except`` 元组加 ``OverflowError``。返回和其它类型错同样的 400 "<field> must be a number"。 新增 3 个测试:3 字段 × ``10^400`` raw-bytes payload(``json=`` helper 会被 httpx 转科学计数法 fit double,所以用 ``content=`` 直接送字面量 big-int)。``test_activity_signal_router.py`` 现 67 个用例全过。 * tune(activity-signal): _EXTERNAL_SIGNAL_TTL_SECONDS 30s→15s 统一前端作 OS 信号主源后,push 管道叠了两个不同步的 5s timer——NEKO-PC 桥主进程 sampler(读 OS 信号)+ 渲染端心跳(读桥缓存快照后 POST)——最坏 数据龄在无丢包时已逼近 ~10-12s。30s TTL 会在心跳死后继续拿陈旧的"用户 活跃"快照太久;10s 又会在远端丢一次包就在 fresh/degraded 间抖动。 15s = 3× 心跳,容忍 ~2 次连续丢包再回落本地采集器,同时把"心跳死后仍报 陈旧活跃"的窗口从 30s 砍半。顺带把 _EXTERNAL_SIGNAL_MIN_INTERVAL 注释里 "TTL≫interval(6×)"的旧说法改正为 3×。 tests/unit/test_activity_signal_router.py + test_activity_tracker_followup.py 共 135 用例全过。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(activity-signal): TTL 描述同步到 15s(CodeRabbit) 上个 commit(42427d6)把 _EXTERNAL_SIGNAL_TTL_SECONDS 改成 15s,但 system_router.py 的端点 docstring + 节流注释、设计文档里还写着 30s。 逐处校正: - system_router.py push_activity_signal docstring:"fresher than ... (30s)" → (15s) - system_router.py 节流注释:"TTL is 30s ... 5 of every 6" → "TTL is 15s (3× this interval) ... 2 of every 3"(配合新比例) - docs/design/user-activity-tracker.md:"When fresh (≤ 30s)" → (≤ 15s) 纯文档/注释,无行为改动;cpu_avg_30s 字段名、prefs 30s 缓存、 activity_guess 30s anti-thrash 等无关 30s 未动。67 用例全过。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hongzhi Wen <cartabio.coder1@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 0037b58 commit c95bca7

9 files changed

Lines changed: 1580 additions & 29 deletions

File tree

docs/design/user-activity-tracker.md

Lines changed: 48 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -794,7 +794,7 @@ and to weigh conversation signals more heavily.
794794
`UserActivityTracker.push_external_system_signal(...)` accepts OS
795795
signals from outside the backend — designed for a frontend (Electron
796796
app, browser via WebSocket, mobile shell) to read its local OS state
797-
and POST it on a heartbeat. When fresh (≤ 30s), pushed signals
797+
and POST it on a heartbeat. When fresh (≤ 15s), pushed signals
798798
override the local collector entirely. When stale (heartbeat stops),
799799
the tracker falls back to the local collector and the degraded marker
800800
re-appears.
@@ -815,10 +815,53 @@ All fields optional — pass whatever the frontend can read on each
815815
platform. The push primes `os_signals_available=True` so the AI sees
816816
non-degraded state.
817817

818-
The HTTP endpoint to receive these pushes hasn't been added yet — when
819-
the frontend implementation lands, wire it via something like
820-
`POST /api/activity_signal/{lanlan_name}` in `system_router.py`.
821-
Until then, the API surface exists for whoever builds it.
818+
#### HTTP endpoint
819+
820+
`POST /api/activity_signal` (in `main_routers/system_router.py`) is the
821+
public surface. Body is a JSON object with `lanlan_name` (required)
822+
plus any subset of the snake_case fields above. The endpoint enforces:
823+
824+
- 400 on malformed body / out-of-range fields
825+
- 404 when `lanlan_name` isn't registered
826+
- 429 when pushed faster than 5s per lanlan_name (matches the heartbeat
827+
cadence — `_EXTERNAL_SIGNAL_MIN_INTERVAL` in `tracker.py`). Honour
828+
`Retry-After`.
829+
- 503 if the character's tracker hasn't initialised yet (boot race —
830+
retry on the next heartbeat)
831+
- 500 if the tracker raises
832+
833+
No auth header today — defended by the per-character `lanlan_name`
834+
lookup + rate limit. A stricter CSRF/Origin guard is tracked for a
835+
follow-up PR; the threat model write-up lives in issue #1023.
836+
837+
#### Renderer client
838+
839+
`static/app-activity-signal.js` does the 5s heartbeat in the desktop
840+
pet window. It reads OS signals through the Electron preload bridge
841+
(`window.nekoActivitySignal.read()` — exposed by the NEKO-PC sibling
842+
repo), normalises camelCase → snake_case, and POSTs to the endpoint
843+
above. The module is defensive: when the bridge isn't exposed
844+
(non-Electron dev runs, mobile browser shell, NEKO-PC older than the
845+
companion PR), it logs once and stays silent — the backend tracker's
846+
local collector handles the rest in degraded mode.
847+
848+
#### Electron bridge contract (NEKO-PC side)
849+
850+
The companion PR in NEKO-PC adds an IPC handler that returns:
851+
852+
```js
853+
{
854+
windowTitle: "VS Code — neko", // active-win → activeWindow.title
855+
processName: "Code.exe", // active-win → activeWindow.owner.name
856+
idleSeconds: 12, // powerMonitor.getSystemIdleTime() (seconds, not ms)
857+
cpuAvg30s: 27.5, // os.cpus() rolling diff, [0, 100]
858+
gpuUtilization: 65.0, // nvidia-smi (optional — None on AMD/Intel)
859+
}
860+
```
861+
862+
The renderer (`app-activity-signal.js`) drops any field that fails
863+
type/range validation and POSTs the rest. A partial snapshot is still
864+
better than no push.
822865

823866
### What works in fully-degraded remote mode
824867

main_logic/activity/system_signals.py

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,17 +42,35 @@
4242
_IS_WINDOWS = platform.system() == 'Windows'
4343

4444

45-
def _force_degraded_from_env() -> bool:
45+
def is_remote_backend_deployment() -> bool:
4646
"""Honour ``NEKO_ACTIVITY_TRACKER_REMOTE`` / ``ACTIVITY_TRACKER_REMOTE``.
4747
48-
Set to ``1`` / ``true`` / ``yes`` when the backend is on a different
49-
machine from the user — covers the Windows-remote edge case where
50-
the local OS APIs would happily report the server's foreground
51-
window (since pygetwindow technically works), but those signals
52-
are about the server, not the user.
48+
Single source of truth for the "is the backend running on a different
49+
machine from the user" question. Two unrelated consumers used to keep
50+
their own copies of this env-var check and drifted:
51+
52+
* the activity collector here — flips the OS-signal pipeline into
53+
degraded mode (window/idle/CPU/GPU come from frontend push or
54+
not at all).
55+
* ``main_routers/system_router._is_remote_backend_deployment`` —
56+
blocks local-machine operations like ``/api/screenshot`` from
57+
accidentally capturing the *server's* desktop and returning it
58+
to the user. ``main_routers/agent_router`` follows the same
59+
rule for ``computer_use`` / agent commands.
60+
61+
Both now call into this function. The check itself is intentionally
62+
cheap (env lookup) so it's safe to call inline on every request.
63+
64+
Set to ``1`` / ``true`` / ``yes`` / ``on`` when the backend is on a
65+
different machine from the user — covers the Windows-remote edge
66+
case where the local OS APIs would happily report the server's
67+
foreground window (since pygetwindow technically works), but those
68+
signals are about the server, not the user. Same applies to
69+
pyautogui screenshots and computer_use commands — they target the
70+
backend machine, which is wrong when that machine isn't the user's.
5371
5472
Default off — most users run backend on their own PC where local
55-
OS signals are correct.
73+
OS signals / screenshots / computer_use are correct.
5674
"""
5775
for key in ('NEKO_ACTIVITY_TRACKER_REMOTE', 'ACTIVITY_TRACKER_REMOTE'):
5876
raw = os.getenv(key, '').strip().lower()
@@ -61,6 +79,12 @@ def _force_degraded_from_env() -> bool:
6179
return False
6280

6381

82+
# Legacy private alias — keeps in-flight callers (and tests that patch
83+
# the private name) working without a sweep. New code calls
84+
# ``is_remote_backend_deployment`` directly.
85+
_force_degraded_from_env = is_remote_backend_deployment
86+
87+
6488
# ── Public snapshot dataclass ───────────────────────────────────────
6589

6690
@dataclass(frozen=True, slots=True)
@@ -185,7 +209,7 @@ def __init__(self, *, poll_interval: float = 5.0) -> None:
185209
# for the case where the backend is a Windows server and the user
186210
# is on a different machine — local OS APIs would technically
187211
# work but report data about the server, not the user.
188-
env_force_degraded = _force_degraded_from_env()
212+
env_force_degraded = is_remote_backend_deployment()
189213
self._os_signals_available: bool = bool(
190214
_IS_WINDOWS and self._gw is not None and not env_force_degraded
191215
)

main_logic/activity/tracker.py

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,30 @@
7272
# seconds. After that the tracker falls back to the local collector
7373
# (which on remote deployments will be in degraded mode) — better to
7474
# advertise "no signal" than to keep using stale window data.
75-
_EXTERNAL_SIGNAL_TTL_SECONDS = 30.0
75+
#
76+
# 15s = 3× the 5s heartbeat. The push pipeline stacks two unsynchronised
77+
# 5s timers — the NEKO-PC bridge sampler (reads OS signals) and the
78+
# renderer heartbeat (reads the bridge's cached snapshot + POSTs) — so
79+
# worst-case data age can already approach ~10-12s before any loss. 15s
80+
# therefore tolerates ~2 consecutive dropped pushes before falling back.
81+
# Shorter (e.g. 10s) would thrash between fresh/degraded on a single
82+
# drop over a lossy remote link; 30s keeps trusting a stale "user
83+
# active" snapshot for too long after the heartbeat dies. 15s balances
84+
# faster stale-detection against fallback thrash.
85+
_EXTERNAL_SIGNAL_TTL_SECONDS = 15.0
86+
87+
# Minimum interval between accepted external-signal pushes for a given
88+
# lanlan_name. Tuned together with the frontend heartbeat: the Electron
89+
# preload pushes every ~5s, so anything more frequent is either a buggy
90+
# client (re-entering the heartbeat) or spam. Enforced by the
91+
# ``/api/activity_signal`` endpoint, not the tracker itself — the
92+
# tracker is happily idempotent and just overwrites the last push.
93+
#
94+
# Pairs with TTL above: TTL is the "data freshness" window, this is the
95+
# "request frequency" cap. TTL is 3× this interval, so the tracker
96+
# tolerates ~2 consecutive rate-limited/dropped pushes and still has
97+
# data within the freshness window.
98+
_EXTERNAL_SIGNAL_MIN_INTERVAL = 5.0
7699

77100

78101
# ── Break-reminder defaults ─────────────────────────────────────────

main_routers/agent_router.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
from .shared_state import get_session_manager, get_config_manager, get_templates
2727
from config import TOOL_SERVER_PORT, USER_PLUGIN_BASE
2828
from main_logic.agent_event_bus import publish_session_event
29+
from main_logic.activity.system_signals import is_remote_backend_deployment
2930

3031
router = APIRouter(prefix="/api/agent", tags=["agent"])
3132
logger = get_module_logger(__name__, "Main")
@@ -56,6 +57,37 @@
5657
}
5758

5859

60+
def _remote_backend_block() -> JSONResponse | None:
61+
"""Reject agent mutation when backend is deployed away from the user.
62+
63+
In remote mode (``NEKO_ACTIVITY_TRACKER_REMOTE=1``) the "computer"
64+
that computer_use / browser_use / openclaw would control is the
65+
*server's*, not the user's — there's no useful action to take, and
66+
silently forwarding the command to a localhost tool_server on the
67+
server side is actively dangerous (anyone who can reach the public
68+
backend HTTP can drive the agent on the server). Returning 501
69+
matches the same-env block on ``/api/screenshot`` in
70+
``main_routers/system_router.py`` so the frontend can surface a
71+
uniform "agent unavailable on remote backend" state.
72+
73+
Threat model context: a deeper CSRF + Origin guard (defending
74+
against DNS-rebinding-style attacks on a *local* backend) is
75+
deferred to a follow-up — it needs the ~15 frontend agent fetch
76+
sites to start sending ``X-CSRF-Token`` first, which doesn't fit
77+
PR B's scope. See issue #1023 for the audit + scope decision.
78+
"""
79+
if is_remote_backend_deployment():
80+
return JSONResponse(
81+
{
82+
"success": False,
83+
"error": "agent disabled in remote backend deployment "
84+
"(NEKO_ACTIVITY_TRACKER_REMOTE)",
85+
},
86+
status_code=501,
87+
)
88+
return None
89+
90+
5991
async def force_disable_agent_for_character_switch(current_lanlan: str, previous_lanlan: str | None = None) -> bool:
6092
"""角色切换后强制关闭猫爪,避免工具服务的全局旧状态串到新角色。"""
6193
names = {
@@ -197,6 +229,9 @@ async def _close_http_client():
197229
@router.post('/flags')
198230
async def update_agent_flags(request: Request):
199231
"""来自前端的Agent开关更新,级联到各自的session manager。"""
232+
blocked = _remote_backend_block()
233+
if blocked is not None:
234+
return blocked
200235
try:
201236
data = await request.json()
202237
_config_manager = get_config_manager()
@@ -276,6 +311,9 @@ async def get_agent_state():
276311
@router.post('/command')
277312
async def post_agent_command(request: Request):
278313
"""统一命令入口,前端只发送 command,不直接操作多路开关。"""
314+
blocked = _remote_backend_block()
315+
if blocked is not None:
316+
return blocked
279317
t0 = time.perf_counter()
280318
try:
281319
data = await request.json()
@@ -521,6 +559,9 @@ async def proxy_task_detail(task_id: str):
521559
@router.post('/tasks/{task_id}/cancel')
522560
async def proxy_task_cancel(task_id: str):
523561
"""Cancel a specific task via tool server proxy."""
562+
blocked = _remote_backend_block()
563+
if blocked is not None:
564+
return blocked
524565
try:
525566
client = _get_http_client()
526567
r = await client.post(f"{TOOL_SERVER_BASE}/tasks/{task_id}/cancel", timeout=5.0)
@@ -534,6 +575,9 @@ async def proxy_task_cancel(task_id: str):
534575
@router.post('/admin/control')
535576
async def proxy_admin_control(payload: dict = Body(...)):
536577
"""Proxy admin control commands to tool server."""
578+
blocked = _remote_backend_block()
579+
if blocked is not None:
580+
return blocked
537581
try:
538582
client = _get_http_client()
539583
r = await client.post(f"{TOOL_SERVER_BASE}/admin/control", json=payload, timeout=5.0)

0 commit comments

Comments
 (0)