feat(memory): confirm auto-generated skills before persisting#5616
feat(memory): confirm auto-generated skills before persisting#5616LaZzyMan wants to merge 19 commits into
Conversation
E2E test reportRan the feature end-to-end against the bundled CLI ( Mock setup
Backend — headless (
|
| Scenario | Result |
|---|---|
autoSkillConfirm=true (default) |
skill staged to .qwen/pending-skills/auto-skill-e2e-demo/SKILL.md; .qwen/skills/ stays empty ✅ |
autoSkillConfirm=false |
skill persisted directly to .qwen/skills/...; .qwen/pending-skills/ empty — old behavior preserved ✅ |
The background skill-review runs to completion even in headless mode (managed-skill-extractor shows up in the run's API stats; read_file 23/23 + write_file 1/1 succeed).
TUI — interactive (tmux)
The dialog auto-opens when the session goes idle after a skill-review produces pending skills:
╭──────────────────────────────────────────────
│ Auto-generated skill — keep it? (1/1)
│
│ auto-skill-e2e-demo
│ Demo skill captured during the e2e run.
│
│ › 1. Keep this skill
│ 2. Discard this skill
│ 3. Keep all remaining
│ 4. Discard all remaining
│
│ Esc to decide later
╰──────────────────────────────────────────────
| Action | Result |
|---|---|
| Keep (Enter) | dialog closes; skill moved into .qwen/skills/auto-skill-e2e-demo/; pending cleared ✅ |
Discard (2 + Enter) |
dialog closes; staged dir deleted; both .qwen/skills/ and pending empty ✅ |
| Esc (decide later) | dialog closes; footer shows ⚠ 1 skill(s) pending review; skill stays staged ✅ |
| After Esc, next idle cycle | dialog does not re-open (per-batch dismiss via dismissedTaskIdRef); footer hint persists ✅ |
Footer hint while pending + dismissed:
YOLO mode (shift + tab to cycle) · 1 task done ⚠ 1 skill(s) pending review
Verdict
The full pipeline works end-to-end: threshold → background skill-review → staging outside the loader's scan path → idle dialog / busy-or-deferred footer → accept (move into library) / reject (delete) / defer (keep staged, no nag).
Notes
- Model responses are mocked so the trigger is deterministic and reproducible; the rest is the real CLI.
- Pending skills live in in-memory task records (documented limitation), so a session restart does not re-surface them — the staged files remain under
.qwen/pending-skills/.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
Also missing: .qwen/pending-skills/ should be added to .gitignore alongside the existing .qwen/skills/auto-skill-*/ entry, since the staging directory contains transient files that should not be committed.
- stage only newly-created skills, never agent-edited pre-existing ones, so Discard can't delete a skill the user already confirmed - re-read pendingSkills after the await in resolvePendingSkill so concurrent Keep-all/Discard-all removes every entry, not just the last - surface accept/reject fs failures (try/catch + log + .catch) instead of silently swallowing them - remount SkillReviewDialog per task via key so its snapshot never goes stale across consecutive skill-review batches - skip redundant skillReviewPending updates with a signature compare - remove the unreachable openSkillReviewDialog action - add debug logging to the pending-skills module - ignore .qwen/pending-skills/ explicitly in .gitignore
|
Addressed all of review round 1 in 506d327. Review (.gitignore): Added Per-thread outcomes:
Tests: new pending-skills staging-filter cases + the Keep-all concurrency test; the manager mock now uses |
Manual e2e verification — auto-skill confirmation flow ✅I ran the unchecked Manual e2e item from the Test Plan locally against the real TUI — the actual Build: HEAD Results
EvidenceThe real dialog (idle, Staging (dialog open, before deciding) — confirms unconfirmed skills are invisible to Keep → live, Discard → deleted, Esc → footer: A/B with Notes
中文版(合并参考)手动 e2e 验证 —— 自动技能确认流程 ✅我在本地用真实 TUI(即真正的 构建: HEAD 结论
证据真实弹窗(空闲, 暂存状态(弹窗打开、尚未决定)—— 证明未确认的技能对 Keep → 上线、Discard → 删除、Esc → 底部提示: A/B: 说明
|
|
Thanks for the PR! Template: The PR uses custom section names ( On direction: This solves a real user problem — issue #5263 asks exactly for this. One-off operations like refactors generate throwaway skills that clutter the library, and a confirmation gate before persisting is the right answer. The design of staging pending skills as a sibling of On approach: 30 files / +1269 looks big, but most of it is mechanical config plumbing (adding The Moving on to code review. 🔍 中文说明感谢贡献! 模板: PR 使用了自定义章节名(Summary、Design notes、Test Plan),与模板要求的标题(What this PR does、Why it's needed、Reviewer Test Plan)不一致。部分模板章节完全缺失:Risk & Scope、Linked Issues、中文说明。不过内容都在,只是换了标题,所以不阻塞——下次请使用标准模板标题方便审查者快速定位。 方向: 解决了真实用户问题(#5263)。一次性操作生成的技能不应悄悄入库,在持久化前加确认是正确的做法。pending-skills 作为 方案: 30 个文件 / +1269 行看起来很多,但大部分是机械性的配置管道(在 cli → acp → desktop → vscode 各层添加 上一轮审查提出的 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code Review (2a)No critical blockers or AGENTS.md violations found. The core staging logic in The config plumbing ( The Minor observations (not blocking):
Testing (2b)Unit tests — all pass ✓BuildFull tmux real-scenarioCLI starts and responds. The @wenshao's manual e2e verification already covers the full idle-dialog → accept/reject flow end-to-end. 中文说明代码审查 (2a)未发现关键阻塞问题或 AGENTS.md 违规。
配置管道 (
非阻塞性观察:
测试 (2b)单元测试 — 全部通过 ✓构建完整 tmux 实际场景CLI 正常启动和响应。 — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM. The staging-as-sibling-directory approach cleanly solves the auto-discovery problem. 562 tests pass, the concurrent "Keep all" race is correctly handled, and the 30-file diff is mostly mechanical config plumbing — every file serves the stated goal. The pre-existing artifact-tool build failure is unrelated.
Approved.
— Qwen Code · qwen3.7-max
|
This is a well-executed feature PR. The motivation is clear (issue #5263 — one-off operations shouldn't silently clutter the skill library), the design is sound (staging as a sibling directory to avoid Going back to my independent proposal: the PR's approach matches or exceeds it in every area. The SkillReviewDialog with per-skill Keep/Discard plus bulk Keep-all/Discard-all is better than a simple yes/no confirmation prompt I would have built. The snapshot-on-mount pattern to prevent index shifting, the The 30-file / +1269-line diff looks intimidating but is mostly mechanical config plumbing that every new memory setting requires. The actual new logic is ~130 lines. Every file in the diff serves the stated goal — no drive-by refactors, no scope creep. 562 unit tests pass across all affected files. The concurrent "Keep all" race condition is specifically tested and correctly handled. The maintainer's manual e2e verification confirms the full dialog → accept/reject flow works end-to-end. The only nits are cosmetic (template headings) and a pre-existing build failure in unrelated artifact-tool code. Neither blocks the merge. Verdict: Approve. ✅ 中文说明这是一个执行良好的功能 PR。动机明确(#5263——一次性操作不应悄悄堆积技能库),设计合理(暂存为兄弟目录以避免 回顾我的独立方案:PR 的方案在每个方面都匹配或超越了我的设想。带有逐技能保留/丢弃加批量全部保留/全部丢弃的 SkillReviewDialog 比我会构建的简单是/否确认提示更好。挂载时快照防止索引偏移、订阅中的 30 文件 / +1269 行的 diff 看起来吓人,但大部分是每个新内存设置都需要的机械性配置管道。实际新逻辑约 130 行。diff 中的每个文件都服务于声明的目标——没有顺手重构,没有范围蔓延。 所有受影响文件的 562 个单元测试通过。并发的"全部保留"竞争条件被专门测试并正确处理。维护者的手动端到端验证确认了完整的对话 → 接受/拒绝流程正常工作。 唯一的小问题是装饰性的(模板标题)和一个无关的预先存在的构建失败。都不阻塞合并。 结论:批准。 ✅ — Qwen Code · qwen3.7-max |
- acceptPendingSkill: when the staged dir is gone, no-op only if the skill is already in the skills root; otherwise throw so resolvePendingSkill keeps it pending and logs, preventing silent data loss - fall back to the agent's systemMessage for progress text when staging yields zero pending (a pre-existing-skill edit is still a durable change) - log the no-task / no-target early returns in resolvePendingSkill - replace internal tracker references in an AppContainer comment
…cases - parseDescription: keep an empty description empty instead of spilling onto the next YAML line - namespace staged dirs under the task id so a later same-named batch can't clobber a still-deferred earlier one - track Esc-dismissed batches in a Set (not a single value) and only mark a batch dismissed on Esc, so a partially-failed Keep-all can reopen for the unresolved skills - document the in-place updateRecord invariant the accept/reject race fix relies on - add the missing license header to pending-skills.test.ts
|
Addressed review rounds 2 & 3 in f30856c and 3cfc6be. Round 2 (f30856c)
Round 3 (3cfc6be)
Multi-batch handling is now hardened against data loss (collision / dismiss-overwrite / Keep-all orphan). Surfacing all coexisting pending batches at once remains a separate UX enhancement. Full |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
Code Review Summary
Reviewed by 9 parallel agents + reverse audit round.
🔴 Critical
useDialogClose.ts:172 — Ctrl+C reopens the skill-review dialog instead of deferring it. The handler calls closeSkillReviewDialog() (which only closes) instead of dismissSkillReviewDialog() (which records the taskId in the dismissed set). After Ctrl+C, the auto-open effect immediately reopens the dialog.
🟡 Suggestion
manager.ts:1148 — resolvePendingSkill catch-and-rethrow path is untested. The error-rethrow path correctly preserves the invariant (skill stays in pendingSkills on failure), but no test exercises this path.
✅ Clean
- Security: No issues — fs operations scoped to project root
- Performance: Acceptable for typical batch sizes
- Test coverage: Thorough coverage of staging, accept/reject, concurrent Keep-all, remount, and dismiss semantics
- Build: 869/869 tests pass; tsc/eslint errors are all pre-existing branch divergence
wenshao
left a comment
There was a problem hiding this comment.
[Critical] Ctrl+C handler in useDialogClose.ts:172 calls closeSkillReviewDialog() instead of dismissSkillReviewDialog() — dialog auto-reopens after Ctrl+C
The handler comment says "same semantics as Esc 'decide later'" and "closeSkillReviewDialog records the dismissed batch so it isn't immediately reopened." After this PR separated close (just setIsSkillReviewDialogOpen(false)) from dismiss (Set.add(taskId) + close), the Ctrl+C handler still calls closeSkillReviewDialog — which no longer records the dismissed batch. Pressing Ctrl+C closes the dialog but the auto-open effect reopens it on the next idle transition because the taskId is not in skillReviewDismissedTaskIdsRef.
Fix: add dismissSkillReviewDialog to DialogCloseOptions and call it from the handler instead of closeSkillReviewDialog.
— qwen3.7-max via Qwen Code /review
…alog - parseDescription: strip a matching pair of surrounding quotes so a `description: "..."` frontmatter value isn't rendered with literal quotes - useDialogClose: Ctrl+C on the skill-review dialog now calls dismissSkillReviewDialog (records the batch as dismissed) instead of plain close, matching Esc — otherwise the idle effect immediately reopened it
|
Addressed review round 4 in 6a1875c.
Full |
| // The staged dir must also be gone | ||
| const stagedPath = path.join( | ||
| projectRoot, | ||
| '.qwen', |
There was a problem hiding this comment.
[Suggestion] Vacuous staged-dir assertion after taskId namespacing
stagedPath is built as .qwen/pending-skills/auto-skill-foo, but since commit 3cfc6be0c the actual staged directory is .qwen/pending-skills/<taskId>/auto-skill-foo. The non-namespaced path was never created, so fs.access(stagedPath) always rejects — this assertion passes regardless of whether rejectPendingSkill actually deleted anything.
The metadata check (remaining.toHaveLength(0)) below is correct, but the filesystem assertion gives false confidence about on-disk cleanup.
| '.qwen', | |
| // The staged dir must also be gone — derive the actual path from task metadata | |
| const pendingSkills = updated?.metadata?.['pendingSkills'] as Array<{ stagedManifestPath: string }> | undefined; | |
| const stagedDir = pendingSkills?.[0] | |
| ? path.dirname(pendingSkills[0].stagedManifestPath) | |
| : path.join(projectRoot, '.qwen', 'pending-skills', taskId, 'auto-skill-foo'); | |
| await expect(fs.access(stagedDir)).rejects.toThrow(); |
— qwen3.7-max via Qwen Code /review
Summary
Resolves #5263. Auto-generated skills (created by the background skill-review agent after tool-heavy sessions) are now reviewed by the user before they enter the skill library, instead of being persisted unconditionally. One-off operations (refactors, migrations) that cross the tool-call threshold no longer silently clutter the skill library.
memory.autoSkillConfirm(defaulttrue). When off, the previous auto-persist behavior is fully preserved. Toggleable via/memoryandsettings.json..qwen/pending-skills/— a sibling of.qwen/skills/that the skill loader never scans — instead of being left live.SkillReviewDialoglets the user Keep / Discard each skill (or Keep-all / Discard-all). Busy session, or after Esc ("decide later"): a footer hint shows the pending count..qwen/skills/; reject deletes the staged copy.Design notes
MemoryManager.runSkillReview, gated byconfirmBeforePersist(fromConfig.getAutoSkillConfirmEnabled()), so the background-agent execution path is untouched.loadSkillsFromDirloads everySKILL.mdunder.qwen/skills/with no prefix filter — unconfirmed skills must be invisible until accepted.Known limitation / follow-up
Pending skills are tracked in in-memory task records, so they are not re-surfaced after a session restart (the staged files remain on disk under
.qwen/pending-skills/). Cross-session recovery is out of scope for this change.Test Plan
SkillReviewDialoginteraction tests (keep / discard / keep-all / discard-all / empty)/memorytoggle navigation + persistencepackages/coresuite green (12651 passed); full typecheck green