perf(cli): Optimize serve daemon startup#5785
Conversation
Slim the qwen serve startup path by deferring interactive UI, runtime, web-shell, and settings-heavy imports until after the listener is ready. Add daemon startup timing, preheat status reporting, fast-path settings/env loading, and regression coverage for import boundaries and runtime directory behavior. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Resolve runQwenServe conflicts after upstream serve and LSP changes while preserving the fast-path runtime split. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Thanks for the PR! Template looks good ✓ — all required sections present with bilingual content. On direction: clearly aligned. Serve daemon cold-start latency is a real user problem — every extra second before the listener binds is pure overhead for CI/CD pipelines, scripting workflows, and IDE integrations that spawn On approach: the scope is large (36 files, +6329/-1443) but after examining the diff, the changes form a cohesive unit — every file change serves the goal of breaking the import chain so the fast path avoids loading React/Ink, yargs, the full settings stack, and ACP runtime before the listener binds. The three-pillar structure (fast-path-argv for arg detection/parsing, fast-path-settings for lightweight env/settings loading, fast-path for orchestration) is clean. The deferred runtime pattern in The fast-path arg parser is a parallel implementation of the yargs-based serve command parsing. Any new Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ — 所有必需章节均存在,包含双语内容。 方向:完全对齐。 方案:范围较大(36 文件,+6329/-1443),但检查 diff 后,所有改动构成内聚整体——每个文件变更都服务于打破 import 链。三柱结构设计清晰。 fast-path 参数解析器是 yargs 的并行实现,未来新增 flag 需两处同步更新,这是已知的性能权衡。 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
This PR currently has merge conflicts against 中文这个 PR 目前与 |
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
Follow-up review at 4fc1429 (model: claude-opus-4-8). I re-checked the 11 earlier /review findings against current HEAD: the Critical (fast path not observing runtimeReady) and the bridge-proxy prefer-const lint are confirmed fixed, and the env-key / qwen-dir duplication is genuinely deduped. 3 new suggestions inline below; no blockers.
跟进 review(HEAD 4fc1429):已对照当前 HEAD 复核之前 11 条 /review 意见——Critical(快路径未观察 runtimeReady)与 bridge-proxy prefer-const lint 均已修复,env-key / qwen-dir 重复也已正确去重。下方 3 条新建议,无阻断项。
wenshao
left a comment
There was a problem hiding this comment.
No new high-confidence findings on re-review at 41e268e. Downgraded from Approve to Comment: CI failing (review-pr, Test (windows-latest, Node 22.x), Test (ubuntu-latest, Node 22.x), CodeQL). All 15 R2 findings are low-confidence (terminal-only, not posted as inline comments). R1 inline comments (11) are on stale commits. 125/125 PR tests pass locally; full CLI suite 9,369/9,369 passes.
— qwen3.7-max via Qwen Code /review
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
🔬 Local build & runtime verification — PR #5785 @
|
| Metric | main (a234860) |
PR-5785 (4fc1429) |
Δ |
|---|---|---|---|
external cmd → listening p50 |
1708 ms | 409 ms | −76% · 4.2× faster |
| external min / max | 1695 / 2041 ms | 399 / 425 ms | |
self-reported processToListenMs p50 |
(field absent) | 112 ms | new |
self-reported runQwenServeToListenMs p50 |
(field absent) | 34 ms | new |
Variance is tiny on both sides (PR 399–425 ms). The ratio is even larger than the PR's own dev.js figures because the dist run removes the constant tsx transpile overhead that diluted them — same direction, stronger signal.
2. Functional equivalence (PR head)
- stdout contract preserved:
qwen serve listening on http://127.0.0.1:<port> (mode=http-bridge, workspace=…) - New observability works —
GET /daemon/statusnow returns astartupblock:and stderr emits"startup": { "processStartedAt": "…", "listenerReadyAt": "…", "processToListenMs": 159, "runQwenServeToListenMs": 46, "preheat": { "status": "running" } }
startup timing: processToListenMs=… runQwenServeToListenMs=…. Thepreheatstate confirms ACP runtime is deferred and reported, not blocking listen. GET /capabilities→ HTTP 200.
3. Tests / format
- Test-plan suites (
fast-path,run-qwen-serve,daemon-status,commands/serve,daemon-logger): 146/146 passed (plan cited 137; head has more, all green). prettier --checkon the listed files: all match.git diff --check: no whitespace errors.
4. Notes for merge
- Typecheck blocker is pre-existing & out of scope — confirmed:
BaseTextInput.tsx(ink/dom,CursorContext) is identical onmainand not touched by this PR. Not a regression. - CI was still pending at verification time (macOS/ubuntu/windows tests + CodeQL running) — let it go green before merge.
- Methodology caveat: I measured the
tscdist(unbundled), not the esbuild production bundle — bundling is blocked in this env for both branches alike by@qwen-code/channel-qqbot(not a workspace package; only a lazy dynamic import, never hit byserve) and missingweb-templatesgenerated files. The A/B is apples-to-apples; the shipped single-file bundle should be at least as fast.
Verdict: runtime verification PASS — the ~4× faster pre-listen path reproduces cleanly, the new /daemon/status startup telemetry is correct, the stdout listening contract is intact, and the test/format gates are green. The only merge gates are CI completion and the documented pre-existing TUI typecheck blocker.
🇨🇳 中文版(完整对应)
🔬 本地构建与运行时验证 — PR #5785 @ 4fc1429
对本 PR head 与 origin/main 分别从源码构建出真实 CLI,再对 qwen serve 冷启动做了 A/B 实测,并端到端验证了 daemon。这是一份运行时/性能验证(与此前的代码评审互补),用作合并参考。
环境: macOS arm64(Darwin 25.5.0),Node v22.22.2。每个 worktree:干净 npm ci + node scripts/build.js --cli-only(均 exit 0)。延迟测量基于 tsc 构建出的 packages/cli/dist/index.js,每次用全新临时 workspace,2 次预热 + 12 次正式,安静 CPU 下顺序执行。
结论 — ✅ 启动性能改进可复现,未发现运行时回归。
1. 冷启动 A/B(构建后的 dist/index.js,各 12 次)
| 指标 | main(a234860) |
PR-5785(4fc1429) |
Δ |
|---|---|---|---|
外部命令 → listening p50 |
1708 ms | 409 ms | −76% · 快 4.2× |
| 外部 min / max | 1695 / 2041 ms | 399 / 425 ms | |
自报 processToListenMs p50 |
(无此字段) | 112 ms | 新增 |
自报 runQwenServeToListenMs p50 |
(无此字段) | 34 ms | 新增 |
两侧方差都很小(PR 399–425 ms)。比值比 PR 自己的 dev.js 数据更大,因为 dist 运行去掉了 tsx 转译这一恒定开销(它稀释了原始比值)——方向一致,信号更强。
2. 功能等价性(PR head)
- stdout 契约保持不变:
qwen serve listening on http://127.0.0.1:<port> (mode=http-bridge, workspace=…) - 新观测能力可用 ——
GET /daemon/status现在返回startup块:且 stderr 输出"startup": { "processStartedAt": "…", "listenerReadyAt": "…", "processToListenMs": 159, "runQwenServeToListenMs": 46, "preheat": { "status": "running" } }
startup timing: processToListenMs=… runQwenServeToListenMs=…。preheat状态印证了 ACP runtime 被延后且被上报,不阻塞 listen。 GET /capabilities→ HTTP 200。
3. 测试 / 格式
- 测试计划里的套件(
fast-path、run-qwen-serve、daemon-status、commands/serve、daemon-logger):146/146 通过(计划写的是 137,当前 head 更多,全绿)。 - 对列出文件
prettier --check:全部符合;git diff --check:无空白错误。
4. 合并注意事项
- typecheck 阻塞是既有问题、且不在本 PR 范围内 —— 已确认:
BaseTextInput.tsx(ink/dom、CursorContext)在main上完全相同,且本 PR 未改动它。不是回归。 - CI 在验证时仍pending(macOS/ubuntu/windows 测试 + CodeQL 在跑)—— 合并前等其转绿。
- 方法学说明: 我测的是
tsc的dist(未打包),不是 esbuild 生产 bundle —— 在本环境里 bundle 对两个分支同样被@qwen-code/channel-qqbot(非 workspace 包;只是个惰性 dynamic import,serve永不触发)和缺失的web-templates生成文件挡住。A/B 是同口径对比;最终单文件 bundle 只会更快或持平。
结论:运行时验证通过 —— pre-listen 路径约 4× 提速可干净复现,新的 /daemon/status 启动遥测正确,stdout listening 契约完好,测试/格式门禁全绿。唯一的合并前置项是 CI 跑完,以及那个有文档记录的、既有的 TUI typecheck 阻塞。
Verified locally by building real CLIs from source on both branches; A/B on built dist. — claude-opus-4-8 via Claude Code
|
@qwen-code /triage |
Code ReviewIndependent proposal (before reading diff): To optimize Comparison with the diff: The PR's approach matches and exceeds this proposal. The three-pillar design ( Reuse check: The parallel arg parser in Security review ( No critical blockers found. The architecture is sound, the security-sensitive trust checking is correctly reimplemented, and the fallback-to-full-parser pattern provides a safe degradation path. Minor: Tests / Format
Real-Scenario Testing (tmux)PR branch —
|
|
Stepping back to look at the whole picture: this PR does exactly what it promises. The motivation is clear (serve daemon cold-start is a real pain point), the implementation is well-architected (three-pillar fast-path with graceful fallback), and the results are reproducible — our own tmux test confirmed the listener comes up in ~421ms process-to-listen with the stdout contract fully preserved, and a collaborator's rigorous 12-run A/B on built dist showed 4.2× faster. Going back to my independent proposal: the PR's approach matches it point-for-point and adds thoughtful details I wouldn't have thought of — the Is the code straightforward? Yes — each pillar has a clear single responsibility, the fast-path falls back to the full path on any unrecognized input, and there's no speculative abstraction. The parallel arg parser and settings loader are inherent to the approach (you can't import the heavy modules without defeating the purpose), and the ~721-line Is every change necessary? Yes — all 36 files serve the goal. The UI/startup refactor files ( The one thing worth flagging for future maintenance: the dual arg parser means any new Verdict: approve. The PR ships a real, measurable user improvement with a clean architecture, thorough tests, and no security concerns. 中文说明整体审视:这个 PR 完全兑现了承诺。动机清晰(serve daemon 冷启动是真实痛点),实现架构良好(三柱 fast-path 配合优雅降级),结果可复现——我们的 tmux 测试确认 listener 在 ~421ms process-to-listen 内启动,stdout 契约完全保持,协作者的 12 次 A/B 构建后测试显示 4.2× 提速。 回到独立方案:PR 的方案逐点匹配,并添加了我不会想到的精心设计—— 代码是否简洁直接?是——每个柱有清晰的单一职责,fast-path 对未识别输入降级到完整路径,没有投机性抽象。并行参数解析器和 settings 加载器是方案固有的,~721 行的 每个改动都是必要的吗?是——36 个文件都服务于目标。唯一的非装饰性问题是 未来维护值得指出的一点:双重参数解析器意味着 serve 命令新增 结论:批准。 PR 交付了真实的、可量化的用户改进,架构清晰,测试充分,无安全顾虑。 — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
| function parsePositiveIntegerEnv(raw: string | undefined): number | undefined { | ||
| if (raw === undefined || raw === '') return undefined; | ||
| const trimmed = raw?.trim(); | ||
| if (!trimmed || !/^\d+$/.test(trimmed)) return Number.NaN; |
There was a problem hiding this comment.
[Critical] parsePositiveIntegerEnv returns Number.NaN (not undefined) for malformed env values such as "abc", "-1", or "1.5". The callers use ??= to apply defaults (e.g., options.rateLimitPrompt ??= parsePositiveIntegerEnv(env['QWEN_SERVE_RATE_LIMIT_PROMPT'])), but NaN is neither null nor undefined, so ??= does NOT short-circuit — it assigns NaN to the option. Downstream, getRateLimitValidationError rejects with "must be a positive integer" and process.exit(1)s, even when the operator expects fallback to the built-in default.
| if (!trimmed || !/^\d+$/.test(trimmed)) return Number.NaN; | |
| if (!trimmed || !/^\d+$/.test(trimmed)) return undefined; |
And similarly on line 116:
| if (!trimmed || !/^\d+$/.test(trimmed)) return Number.NaN; | |
| return Number.isSafeInteger(parsed) && parsed > 0 ? parsed : undefined; |
— qwen3.7-max via Qwen Code /review
|
Qwen Code review did not complete successfully: Qwen review timed out after 85 minutes. See workflow logs. |
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Build failure root cause — it's a
|
| } | ||
| } | ||
|
|
||
| export function setUpCloudShellEnvironment(envFilePath: string | null): void { |
There was a problem hiding this comment.
[Suggestion] setUpCloudShellEnvironment (the single-file-path variant) appears to be dead code. It is exported and re-exported from settings.ts, but grep across the entire packages/ tree finds zero call sites. The active variant is setUpCloudShellEnvironmentFromFiles (multi-file). Consider removing this dead export or marking it @deprecated if backward compatibility is needed for external consumers.
— qwen3.7-max via Qwen Code /review
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Qwen Code review did not complete successfully: Qwen review timed out after 85 minutes. See workflow logs. |
wenshao
left a comment
There was a problem hiding this comment.
Follow-up review of the latest changes (HEAD 095188d0) with claude-opus-4-8. The earlier Critical findings look addressed in 97bb0d29/095188d0 (fast path now awaits runtimeReady via waitForServeRuntimeOrExit; --experimental-lsp parsed; prefer-const lint clear; path-helper dedup into path-comparison.ts; trust-rule caching). Locally ESLint, tsc, and the serve/config test suites are all green here — the red Test legs appear to be the known unrelated main SDK-bundle-size regression rather than this PR. Two minor follow-up notes inline.
— claude-opus-4-8[1m] via Qwen Code /qreview
wenshao
left a comment
There was a problem hiding this comment.
No review findings on this incremental commit. Downgraded from Approve to Comment: CI failing (Test (macos-latest, Node 22.x), Test (windows-latest, Node 22.x), Test (ubuntu-latest, Node 22.x)).
R5 (095188d) addresses 4 prior R4 suggestions:
failRuntimeStartupnow writes todaemonLog.error()(observability)resolveRuntimeStartupTimeoutMsenv var parsing is now tested (6 parametrized cases)- Bootstrap catch-all 503 handler for unknown routes is now tested
- Rate limit env var activation and discard paths are now tested
- Trust-rule evaluation refactored to pre-compute path variants (
CachedTrustRule), avoiding redundantgetPathComparisonVariants/fs.realpathSynccalls per check
Two low-confidence suggestions for human review (terminal-only):
buildTrustedFolderRulesTRUST_PARENT branch has no fast-path-specific test (logic unchanged from pre-existing code tested intrustedFolders.test.ts)resolveRuntimeStartupTimeoutMsdirect-override branches (positive, zero, negative, Infinity, NaN) are untested — only the env-var fallback path is covered
— qwen3.7-max via Qwen Code /review
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
CI failures (Test on all platforms, CodeQL) are pre-existing — root cause is the SDK bundle exceeding its 128KB size cap on main, unrelated to this PR. ESLint and TypeScript pass cleanly; all 165 PR-specific tests pass.
— qwen3.7-max via Qwen Code /review
| } | ||
| forceFlushMetrics() | ||
| (coreRuntimePromise | ||
| ? coreRuntimePromise.then((core) => core.forceFlushMetrics()) |
There was a problem hiding this comment.
[Critical] The shutdown chain gates the force-close timer (line 2196) and all subsequent cleanup on coreRuntimePromise.then(core => core.forceFlushMetrics()). If the dynamic import('@qwen-code/qwen-code-core') stalls — realistic on cold NFS, partial deploys, or ESM resolution issues — this promise never resolves, the .catch().then() chain blocks, and the force-close timer is never armed. The daemon becomes a zombie: TCP listener still open, bootstrap /health returning 200, but every real route returning 503. Signal handlers (SIGINT/SIGTERM) enter this same blocked close promise, so they also hang.
The runtimeStartupTimeoutMs (120s) correctly rejects runtimeStarting, triggering waitForServeRuntimeOrExit → handle.close() — but handle.close() enters this same blocked chain, so the exit never happens.
| ? coreRuntimePromise.then((core) => core.forceFlushMetrics()) | |
| // Fire-and-forget: never block shutdown on core being loaded. | |
| void (coreRuntimePromise | |
| ? coreRuntimePromise.then((core) => core.forceFlushMetrics()) | |
| : Promise.resolve() | |
| ).catch((flushErr) => { | |
| daemonLog.warn( | |
| `pre-shutdown metrics flush failed: ${ | |
| flushErr instanceof Error ? flushErr.message : String(flushErr) | |
| }`, | |
| ); | |
| }); | |
| // Proceed immediately to runtime wait + cleanup + force timer. | |
| Promise.resolve() | |
| .then(async () => { | |
| await waitForRuntimeStartingForShutdown(runtimeStarting, daemonLog); |
— qwen3.7-max via Qwen Code /review
What this PR does
This PR makes
qwen servereach the HTTP listener earlier by adding a slim serve fast path, deferring interactive UI, React/Ink, full settings, web-shell, and ACP runtime work until after the listener is ready or until the relevant mode needs it. It also adds daemon startup observability so/daemon/statusand stderr report process-to-listen timing, runQwenServe-to-listen timing, and ACP preheat state without changing the stdout listening contract.Why it's needed
Cold-start latency for
qwen servewas dominated by modules that are not needed before the daemon can bind a port, including interactive UI and full runtime setup. Moving those imports and runtime work off the pre-listen path improves daemon responsiveness while keeping the deeper first-session ACP initialization out of scope. The added timing fields make future regressions diagnosable from both local logs and/daemon/status.Reviewer Test Plan
How to verify
Run
cd packages/cli && npx vitest run src/serve/fast-path.test.ts src/serve/run-qwen-serve.test.ts src/serve/daemon-status.test.ts src/commands/serve.test.ts src/serve/daemon-logger.test.tsand expect all 137 tests to pass. Runcd packages/cli && npx prettier --check src/serve/daemon-logger.ts src/serve/run-qwen-serve.ts src/serve/fast-path-settings.ts src/serve/run-qwen-serve.test.ts src/serve/fast-path.test.ts src/gemini.tsx src/ui/startInteractiveUI.tsx src/config/settings.ts src/config/environment.ts src/config/default-theme-names.tsand expect Prettier to report all matched files use the configured style. Rungit diff --cached --checkbefore commit, orgit diff HEAD^..HEAD --checkafter commit, and expect no whitespace errors. Optionally runnode scripts/dev.js serve --port 0 --hostname 127.0.0.1 --no-open --no-webrepeatedly and confirm the stdout line remainsqwen serve listening on ...while stderr includes startup timing.Evidence (Before & After)
Earlier local baseline for external command to listener ready was approximately 3.55s p50. After this change, a local 5-run sample of
node scripts/dev.js serve --port 0 --hostname 127.0.0.1 --no-open --no-webproduced runs[1068, 1246, 1092, 2364, 2729]ms, p501246ms; daemon stderr reported process-to-listen p50 around383msand runQwenServe-to-listen p50 around68ms.Tested on
Environment (optional)
macOS Darwin 25.4.0 arm64, Node v22.22.3, npm 10.9.8.
Risk & Scope
runQwenServecallers and for settings/runtime directory resolution.npm run typecheckis still blocked by an existing unrelatedBaseTextInput.tsxissue whereink/domandink/components/CursorContextcannot be resolved andcursorCtxis inferred asunknown; this PR does not fix that TUI build blocker. FirstPOST /sessionACP deep optimization is also out of scope./daemon/status.Linked Issues
N/A
中文说明
What this PR does
这个 PR 通过增加轻量的 serve fast path,让
qwen serve更早到达 HTTP listener;交互 UI、React/Ink、完整 settings、web-shell 和 ACP runtime 工作会延迟到 listener ready 之后或对应模式真正需要时再加载。同时补齐 daemon 启动观测,让/daemon/status和 stderr 暴露 process-to-listen、runQwenServe-to-listen 以及 ACP preheat 状态,并保持 stdout listening 行不变。Why it's needed
qwen serve冷启动耗时主要被 listen 前不需要的模块拖慢,包括交互 UI 和完整 runtime 初始化。把这些 import 与 runtime 工作移出 pre-listen 路径,可以提升 daemon 绑定端口的响应速度,同时不把更深的首个 session ACP 初始化混入本 PR。新增 timing 字段也让后续启动回退可以从本地日志和/daemon/status直接定位。Reviewer Test Plan
How to verify
运行
cd packages/cli && npx vitest run src/serve/fast-path.test.ts src/serve/run-qwen-serve.test.ts src/serve/daemon-status.test.ts src/commands/serve.test.ts src/serve/daemon-logger.test.ts,预期 137 个测试全部通过。运行cd packages/cli && npx prettier --check src/serve/daemon-logger.ts src/serve/run-qwen-serve.ts src/serve/fast-path-settings.ts src/serve/run-qwen-serve.test.ts src/serve/fast-path.test.ts src/gemini.tsx src/ui/startInteractiveUI.tsx src/config/settings.ts src/config/environment.ts src/config/default-theme-names.ts,预期 Prettier 报告所有匹配文件符合格式。提交前运行git diff --cached --check,或提交后运行git diff HEAD^..HEAD --check,预期没有 whitespace 错误。也可以重复运行node scripts/dev.js serve --port 0 --hostname 127.0.0.1 --no-open --no-web,确认 stdout 仍输出qwen serve listening on ...,stderr 包含 startup timing。Evidence (Before & After)
此前本地 external command 到 listener ready 的基线约为 3.55s p50。变更后,本地 5 次运行
node scripts/dev.js serve --port 0 --hostname 127.0.0.1 --no-open --no-web得到[1068, 1246, 1092, 2364, 2729]ms,p50 为1246ms;daemon stderr 自报 process-to-listen p50 约383ms,runQwenServe-to-listen p50 约68ms。Tested on
Environment (optional)
macOS Darwin 25.4.0 arm64,Node v22.22.3,npm 10.9.8。
Risk & Scope
runQwenServe调用方以及 settings/runtime 目录解析的兼容性。npm run typecheck仍被既有且无关的BaseTextInput.tsx问题阻塞:ink/dom和ink/components/CursorContext无法解析,cursorCtx被推断为unknown;本 PR 不修复该 TUI build blocker。首个POST /session的 ACP 深度优化也不在本 PR 范围内。/daemon/status。Linked Issues
N/A