Conversation
Extend skill candidate create/read flows with optional summary, usage notes, and structured pre/post conditions. Add release promotion metadata fields for upgrade lineage and change context, including parent release ID, upgrade reason, and change summary. Propagate these fields through Bay API models/services, SDK types and client methods, MCP handlers/tool schemas, and integration/unit tests.
Document the project's two-layer self-iteration model separating execution evidence from versioned release decisions. Expand lifecycle guidance to show optional metadata fields for candidate creation (`summary`, `usage_notes`, `preconditions`, `postconditions`) and release promotion (`upgrade_of_release_id`, `upgrade_reason`, `change_summary`) to improve explainability and auditability.
Introduce soft-delete support across Bay skill lifecycle APIs, service logic, and persistence models with delete metadata fields (is_deleted, deleted_at, deleted_by, delete_reason). Add DELETE endpoints for candidates and releases with guardrails: active releases cannot be deleted, and candidates referenced by active releases cannot be deleted. Exclude deleted records from get/list and related lifecycle queries to keep APIs consistent. Expose delete operations in shipyard-neo-sdk and shipyard-neo-mcp, including optional delete reasons in DELETE request bodies, and add integration/unit tests covering end-to-end behavior and tool output.
Make `delete_release` and `delete_candidate` accept an optional `SkillDeleteRequest` so clients can call DELETE endpoints without sending a body. Update shipyard-neo docs to reflect cleanup operations in the lifecycle flow and document delete tools plus optional candidate/release metadata fields.
Update version metadata in pyproject.toml for bay, gull, ship, shipyard-neo-mcp, and shipyard-neo-sdk. Regenerate uv.lock entries to keep package versions aligned.
soft-delete no longer rejects active releases and now deactivates them as part of deletion so active lookups skip deleted records. update integration and unit tests to cover the new behavior and align SDK/docs wording with server-side semantics.
Document valid `create_skill_payload` formats and note that top-level scalar JSON values are not accepted. Explain `payload_ref` usage for reusable storage and candidate attachment, and add replay behavior details: - browser replay is supported via the skill run endpoint and requires a JSON object payload with a non-empty `commands` array - python/shell currently have no release-based replay endpoints
Ensure candidate promotion_release_id does not reference deleted or missing releases. When deleting a release, clear promotion pointers on matching candidates in the same transaction. Also sanitize candidates on read (get/list) to clean up historical dangling references and keep API responses consistent.
Allow `create_payload` to take a JSON string and normalize it to an object/array before sending the request. Raise a `ValueError` for invalid or non-object/array JSON payloads to fail fast, and add client tests for both success and failure paths.
promote requests from the test extra to core ship dependencies and add tenacity, cachetools, tqdm, orjson, python-slugify, and tomli to the runtime set. regenerate uv.lock and requirements.txt to reflect the new dependency graph, and update python-sandbox and shipyard-neo skill docs to list the expanded web and utility libraries.
enqueue a warmup hook on successful sandbox creation so runtime startup can begin without delaying API response completion. skip warmup when returning an idempotency cache hit, and add unit tests covering warmup scheduling and idempotent create paths.
introduce a warm pool system to reduce sandbox cold-start latency by pre-warming instances and claiming them on create when available. add global and per-profile warm pool configuration, startup/shutdown lifecycle hooks, an in-process bounded warmup queue with workers, and a periodic scheduler to replenish and rotate warm instances. update sandbox creation flow to check idempotency first, attempt warm claim before normal create, and enqueue warmup work through the queue (with background-task fallback when queue is unavailable). extend sandbox model/manager with warm pool state and atomic claim logic, exclude warm pool instances from user listing and GC expiry/idle scans, and add unit tests for claim behavior, manager methods, and queue lifecycle/statistics.
Align bay, gull, ship, shipyard-neo-sdk, and shipyard-neo-mcp versions and lockfiles for the 0.2.0 release. Update Bay API v1 docs to clarify sandbox create behavior with idempotency cache, warm pool claim fallback, and warmup queue semantics.
Patch `utcnow` in warm pool manager unit tests to control `warm_ready_at` and `warm_rotate_at` values directly. This removes timing tolerance and sleep-based ordering, reducing flakiness and making readiness/claim assertions exact and repeatable.
There was a problem hiding this comment.
Sorry @w31r4, your pull request is larger than the review limit of 150000 diff characters
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces significant enhancements to the skill evolution control plane, focusing on completing the evolution loop, fixing signal pollution, and ensuring compatibility across various components. Additionally, it implements a warm pool feature to reduce sandbox startup latency, improving overall system performance and reliability. Highlights
Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
本次 PR 围绕技能演化(skill evolution)控制面进行了一系列重要改进,成功构建了从目标定义、变异、评估到自动晋升的完整闭环。主要改动包括:
- 完成了演化闭环:引入了
SkillGoal、RubricGenerator和GoalConditionedEvaluator,使演化过程由人类意图驱动,并能根据评估结果自动晋升候选技能。 - 修复了调度和信号问题:通过对调度器去重和
record_outcome的归属校验,有效防止了重复触发和信号污染。 - 打通了全栈兼容性:API、SDK 和 MCP 均已兼容演化候选(evolution candidate)的
preconditions/postconditions新的列表格式,增强了向后兼容性。 - 引入了 Warm Pool 机制:通过预热沙箱实例,显著降低了冷启动延迟。
整体来看,这些改动设计精良,模块划分清晰,并配备了充分的单元测试和集成测试,是项目在智能化和健壮性方面迈出的重要一步。代码质量很高,有几处小的改进建议,主要关于异常处理的精确性和演示脚本的完整性。
| if isinstance(preconditions, list): | ||
| print(f" Pre: " + "\n ".join(preconditions)) | ||
| if isinstance(postconditions, list): | ||
| print(f" Post: " + "\n ".join(postconditions)) |
| return None | ||
| try: | ||
| parsed = json.loads(raw) | ||
| except Exception: |
| except Exception: | ||
| pass |
There was a problem hiding this comment.
背景
这组改动围绕 skill evolution 控制面做了三件事:
同时补了一个旧数据兼容边角:历史上如果只写入了
rubric_json,但当时还没有回填rubric_summary,现在会在缓存命中时自动补写,保证/v1/skills/goals的返回一致。主要改动
1. 补齐 skill evolution 的目标驱动闭环
SkillMutationAgent现在会在 mutate 时读取 skill goal,并把 goal 注入 mutation prompt。RubricGenerator和GoalConditionedEvaluator两个模块,通过 OpenAI-compatible HTTP 接口分别负责 rubric 生成和 mutation 评估。SkillGoal.rubric_json,并同步维护rubric_summary。goal 文本变更后会主动清空旧缓存,下一轮重新生成。passed=true且score >= auto_promote_threshold时,candidate 会自动 promote 到 canary。llm.enabled与外层evolution.enabled的职责被明确分开:前者控制 LLM 驱动的演化逻辑,后者控制调度循环。2. 修复 scheduler 重复触发和 outcome 信号污染
system:evolutioncandidate 是否已经覆盖当前 failure 窗口,避免同一批失败在多个周期里重复 mutate。record_outcome现在会校验(release_id, owner, skill_key)三者一致性,防止将错误 execution outcome 写到不属于自己的 release 或 skill 上。3. 打通 API / SDK / MCP 的 evolution candidate 兼容
preconditions/postconditions由原来的单一 dict 形式,扩展为兼容list[str] | dict[str, Any] | null。create_skill_candidatetool schema、handler 和回归测试已同步更新,创建/读取路径都不会再把 list 格式截断为 null。4. 测试与回归
rubric_json缺失rubric_summary的场景新增回归测试,确保历史数据会自愈。验证结果
pkgs/bay:550 passed, 191 skippedshipyard-neo-sdk:46 passedshipyard-neo-mcp:52 passedruff check已通过影响范围
Reviewer 建议关注点
SkillMutationAgent中 rubric 缓存/失效与 evaluator 的配合是否符合预期。preconditions/postconditions双格式兼容是否符合使用方预期。Reviewer 导读
这个 PR 相对
main的 diff 较大,原因是当前分支是一个叠加过历史提交的 feature branch。Files changed里会同时出现 extraction pipeline、skill evolution、API/SDK/MCP 兼容、以及部分更早的分支基线改动。如果 reviewer 主要想确认“这次 skill evolution/skill extraction 的逻辑是否可合并”,建议不要从
Files changed顶部顺着看,而是按下面顺序 review:推荐 review 顺序
Extraction Pipeline 主线
3e02d60这笔提交。pkgs/bay/app/services/skills/extraction/*、pkgs/bay/app/services/skills/scheduler.py、pkgs/bay/tests/unit/managers/test_extraction_strategies.py、pkgs/bay/tests/unit/managers/test_browser_learning_scheduler.pySkill Evolution API 与基础链路
592b27d。pkgs/bay/app/api/v1/skills.py、pkgs/bay/app/services/skills/evolution/*、pkgs/bay/app/services/skills/service.py、pkgs/bay/tests/integration/core/test_skill_evolution_api.pyHardening 与闭环补齐
23de36d、9f068d2、a255127。pkgs/bay/app/services/skills/evolution/agent.py、pkgs/bay/app/services/skills/evolution/llm.py、pkgs/bay/app/services/skills/evolution/scheduler.py、shipyard-neo-sdk/shipyard_neo/types.py、shipyard-neo-mcp/tests/test_server.pyrecord_outcome归属校验、API/SDK/MCP 对 list/dict/null 双格式兼容、旧rubric_json缓存自愈。Reviewer 常见问题预答
为什么 rubric 不是在 declare_goal 时立即生成?
declare_goal保持为纯控制面写操作,不强依赖 LLM。rubric 只在真正进入 evolve/mutate 路径时生成,并做缓存,这样llm.enabled=false时不会因为 goal 声明而触发额外外部依赖。为什么
preconditions/postconditions要兼容list | dict | null?为什么 scheduler 去重是按“最近 mutation 是否晚于最近 failure”判定?
自动 promote 是否会误判?
passed=true;必须score >= auto_promote_threshold。低于阈值的 candidate 保持DRAFT,不会被自动拒绝,仍可人工审阅。建议重点看哪些测试
pkgs/bay/tests/unit/managers/test_skill_mutation_agent.pypkgs/bay/tests/unit/managers/test_evolution_scheduler.pypkgs/bay/tests/unit/managers/test_skill_evolution_service.pypkgs/bay/tests/integration/core/test_skill_evolution_api.pyshipyard-neo-sdk/tests/test_skills_and_history.pyshipyard-neo-mcp/tests/test_server.py