feat(agent_platform): kernel skills + MCP builder playbooks#66211
feat(agent_platform): kernel skills + MCP builder playbooks#66211daniloc wants to merge 7 commits into
Conversation
…books The agent-builder concierge bundled 17 "skills" that were ALSO served by the MCP `agent-resolve-resource` tool (both generated from the same dir), so the agent had two copies of the same guidance and agent.md told it the bundled copy was "primary" — the wrong one. Resolve by giving each piece of content exactly one home, chosen by what it is: - **Builder playbooks (13)** — reusable knowledge about the authoring tools. Moved to `services/mcp/playbooks/` (MCP-owned), served via `agent-resolve-resource` to any consumer with a live, scope-aware tool surface. No longer bundled. The concierge fetches them like any other MCP consumer — it builds agents, so it needs build knowledge the same way Code does. - **Kernel skills (4)** — the concierge's own runtime behaviour, coupled to the implementation (`focus_*` client tools, client-kind modes, safety persona, fleet-audit workflow): safety-and-boundaries, using-the-console-ui, working-outside-the-console, auditing-the-fleet. Stay code-bundled in the concierge — they must move in lockstep with the runtime and can't drift in a per-account DB, so they are NOT store skills. This complements the skill-store work: store = team content (skill_refs, DB, per-account); MCP playbooks = platform docs (code, served to builders); kernel = the concierge's architecture-coupled behaviour (code, bundled). Each piece of content has one canonical home; nothing is inlined-and-duplicated. agent.md rewritten to distinguish "fetch the <id> playbook" (the 13) from "load skills/<id>" (the 4 kernel), with a framing section up top. `services/mcp/playbooks/README.md` captures the three-home rationale. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The tool's top-level description still called the playbooks "the same skills the agent concierge loads" — stale after the split. They are builder playbooks the concierge (and any consumer) fetches; the concierge's kernel skills are bundled separately. Reword to match, and note the response includes the live tool surface. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Kernel skills — the concierge's own code-locked operator behaviour (safety, console-UI, working-outside-the-console, fleet audit) — are now injected into an agent's bundle at freeze from backend code (products/agent_platform/backend/kernel_skills/), materialized alongside the store-resolved skill_refs and merged into the derived spec.skills[]. They cannot be authored through any API: there is no skills field on an author endpoint, so skill_refs -> the llma-skill store stays the only author path into a bundle's skills/. Kernel content must move in lockstep with the platform and be identical across accounts, so it can't live in the DB (drift) and can't be author-authored. (The skill-store cutover had made bundle skills store-only, removing the prior inline path.) Data-driven: each kernel skill is a folder whose SKILL.md frontmatter declares its description and an `agents` mapping (`*` = every agent, or a slug list). The freeze exempts kernel ids from the legacy-orphan guard and the sweep, with a kernel-vs-store alias collision check. Also addresses review findings: single-line <=280 descriptions with a load-time parity guard (the janitor derives spec.skills[].description from only the first frontmatter line, so a folded one would silently truncate the model's load signal), loud validation of malformed kernel folders, regenerated services/mcp/schema/tool-inputs.json, and doc path fixes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BY6C34DmGGmL4sgx5Wajn6
|
Size Change: 0 B Total Size: 64 MB ℹ️ View Unchanged
|
|
Reviews (1): Last reviewed commit: "feat(agent_platform): inject platform ke..." | Re-trigger Greptile |
The freeze view imports the new logic/kernel_skills registry; add it to the import-linter "presentation must use facade" allowlist alongside the existing views→logic.* entries so `lint-imports` passes. (No new boundary — same pattern as the other agent_platform presentation→logic imports.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BY6C34DmGGmL4sgx5Wajn6
|
⏭️ Skipped snapshot commit because branch advanced to The new commit will trigger its own snapshot update workflow. If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
|
Addresses review findings on the kernel-injection path: - Scope the blast radius: kernel_skills_for(slug) now only fully validates the folders that target the agent being frozen, and skips dot/underscore cruft dirs (e.g. __pycache__). A malformed folder for one agent can no longer raise a 500 that breaks freeze for every other tenant. The strict whole-set _all_kernel_skills() check stays (exercised by a unit test) so a bad folder still fails CI before merge. - Assert every injected kernel id is present in the sealed spec before the draft→ready flip — a 2xx put_skill that didn't materialize would otherwise ship a live agent silently missing a kernel skill (e.g. safety-and-boundaries). - Validate `agents` slugs against the resource-id regex and reject "*" mixed with specific slugs (a typo silently reaches nobody; a stray wildcard silently broadcasts). Reject CR line endings — checked on the raw bytes the janitor reads, since read_text() strips them. Tests cover the loader rejections, the runtime scoping resilience, and the updated freeze materialization invariant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BY6C34DmGGmL4sgx5Wajn6
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
|
⏭️ Skipped snapshot commit because branch advanced to The new commit will trigger its own snapshot update workflow. If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
|
…playbooks-from-mcp
dmarticus
left a comment
There was a problem hiding this comment.
Solid refactor overall — the three-homes split (kernel / playbook / store) is a real improvement, the freeze-time injection plumbing is thoughtful, and the test coverage on the new logic is substantive. Requesting changes because three findings put users into states with no programmatic recovery; the fixes look small.
Blockers
1. Post-seal materialization invariant can permanently wedge a sealed bundle (backend/presentation/views.py:2469)
The new missing_kernel check turns any drift between the kernel set at original-seal time and the retry attempt into a stuck draft:
- Attempt 1 seals the bundle; the response is lost or the client times out.
- Between then and the retry, the kernel set for this slug grows (or a kernel is renamed).
- Attempt 2:
put_skill409s withsealed-bundle→ caught by_is_sealed_bundle_conflict→ falls through to idempotentjanitor.freeze, which re-derivesderived_specfrom sealed bytes that cannot contain the newer kernel id. kernel_ids - materialized_ids = {missing}→ APIException → row stays draft.
Every subsequent retry repeats the same failure (the bundle is permanently sealed; put_skill can no longer rewrite it). The retry path is no longer the idempotent operation the docstring promises. Either gate the invariant on whether we actually wrote a kernel skill this turn (skip on the 409 fall-through path), or treat 'kernel skill missing from a previously sealed bundle' as an acceptable post-condition for re-freeze.
2. Legacy-orphans recovery message is circular (backend/presentation/views.py:2393)
The error says: "branch a fresh draft from the current live revision (which no longer carries it)". But the current live revision does still carry the de-designated kernel id in its sealed spec.skills[] — a fresh draft branched from live inherits the same orphan and hits the identical guard. skills[] is server-pinned on spec writes (test_spec_write_preserves_server_skills), so there's no API path that lets the user remove the inline entry either.
Either the orphan guard needs to tolerate a stale kernel id when the rest of the bundle agrees, or there needs to be an explicit 'drop stale kernel entry' operation.
3. Cross-team clone_from of agent-builder bricks freeze (backend/presentation/views.py:2386)
A normal (non-allowlisted) team that clone_froms an agent-builder revision lands an opaque server-minted slug. kernel_skills_for(opaque_slug) returns [], so the carried inline kernel entries (no source_version_id) fall into legacy_orphans → 400. There's no live revision in the destination team to "branch from", and skills[] is unwritable. The fork is permanently unfreezable.
Per the agents: mapping the kernel skills can target a slug — but the clone path doesn't rename, so this is structurally bricked unless clone_from filters out non-applicable kernel-id inline entries from the carried spec, or the orphan guard treats 'inline entry that was a kernel id for the source's slug' as silently droppable.
Worth fixing in the same review
_agents_ofis locale-dependent (backend/logic/kernel_skills.py:121).md.read_text()uses the process locale. Every shipped kernel SKILL.md description contains UTF-8 em dashes; underLC_ALL=CorPOSIX(ASCII-default) the very first folder iteration raisesUnicodeDecodeError, taking down freeze for every agent — defeating the docstring's "malformed folder for one agent can't take down freeze for another" invariant. Passencoding="utf-8"explicitly (and considerread_bytes().decode("utf-8")for consistency with_load_skill).- Test mock doesn't pin the call arg (
backend/tests/test_freeze_skill_refs.py:187). All four new tests domock_kernel.return_value = [...]and never assert whatkernel_skills_forwas called with. Swapping the call site atviews.py:2366to passrevision.id(or anything else) keeps every test green while silently disabling kernel injection. Addmock_kernel.assert_called_with(self.application.slug)per test. auditing-the-fleet/SKILL.mdlines 19–20 still say bare "Load it" fordebugging-sessionsandediting-agents-safely— both are playbooks now, not bundled kernel skills. Directly contradictsagent.md's rule "Never recite … from memory — fetch the playbook" and "Builder playbooks are not in your bundle — don't look forskills/<id>files for them." Line 99 of the same file uses the correct "fetch the … playbook" phrasing, so the contract is clear — 19/20 were left in pre-refactor form._skill_dirs()raises on missing dir (backend/logic/kernel_skills.py:187).iterdir()on a missing path raisesFileNotFoundError, propagated uncaught. Any deploy variant that ships the Django code withoutkernel_skills/500s every freeze across every team, including agents the platform never designated. Anis_dir()guard at the top of_skill_dirsdegrades to empty list.yaml.safe_load(block)is unguarded (backend/logic/kernel_skills.py:153). Broken YAML raisesyaml.YAMLError, notValueError. Wrap to re-raise asValueErrorto match the rest of the malformation contract (and the parameterized rejection tests).- Concurrent-freeze race produces a wrong error (
backend/presentation/views.py:2471). When process A wins the conditional UPDATE and flips state to ready, process B's post-seal check can raise "revision left in draft. Retry the freeze" — but the row is already ready, and the user's retry hits the early "Cannot freeze a ready revision" guard. The bundle is correctly sealed under A's intent; the user is bricked by misleading error text.
Nits / follow-ups (not blocking)
- Post-seal invariant is structurally gated on
derived_spec is not None— if a future janitor response ever omitsderived_spec, the check fails open (skips), not closed. Worth inverting. - Frozen
spec.skills[]carries kernel entries with nosource_version_idand no kernel marker — indistinguishable from legacy inline skills for any downstream consumer that discriminates on those fields. _RESOURCE_ID_REis the third Python copy of the janitor'sRESOURCE_ID_REGEX(the others live atpresentation/views.py:193andpresentation/serializers.py:277). The module's own docstring warns about exactly this drift surface; worth lifting to one constant.- Altitude question: kernel injection in Django ships kernel bytes over HTTP every freeze and leaves the janitor kernel-unaware. Any future bundle-producing path that doesn't go through Django's freeze (a janitor-side authoring API, a CLI seeder, future tooling) will silently omit kernel skills the platform considers code-locked. Worth thinking about whether
deriveSpecis the more natural home long-term.
Happy to pair on any of these.
Problem (written by Danilo)
Migrating to the skills store left the agent builder in an awkward place. It had duplicate skills to reason about, since its playbooks now lived in two places, and its prompting wasn't helping at all.
We need a way to left the agent builder reference the same playbooks via MCP that MCP consumers do, while still allowing its agent-specific guidance (how to use focus tools, for example) to be scoped properly.
Changes (written by robot)
Splits the concierge's instructional content into three clearly-owned homes:
products/agent_platform/backend/kernel_skills/), merged with store-resolvedskill_refs. Code-locked, identical across accounts, never in the DB.services/mcp/playbooks/, served on demand viaagent-resolve-resourcewith a live, scope-aware tool surface.skill_refs(unchanged).The load-bearing constraint: any agent can carry kernel skills, but no outside author can author them. There is no
skillsfield on any author endpoint, soskill_refs→ the store stays the only author path into a bundle'sskills/. Kernel content must move in lockstep with the platform and be identical per account, so it can't live in the DB (drift) and can't be author-supplied.Kernel skills are fully data-driven: each is a folder whose
SKILL.mdfrontmatter declares itsdescriptionand anagentsmapping (["*"]= every agent / the shared baseline, or a slug list). Adding one is "drop a folder." At freeze the injected kernel ids are exempted from the legacy-orphan guard and the bundle sweep, with a kernel-vs-store alias collision check.How did you test this code?
I'm an agent (Claude Code) — no manual UI testing. Automated checks I actually ran:
test_freeze_skill_refs.py(new: injection alongside store refs, sweep exemption, collision rejection, legacy-guard exemption) + newtest_kernel_skills.py(registry folder-scan,agentsmapping, single-line/≤280 parity).mypyclean on changed files.tool-inputs.jsonregen.example-agent-builderbundle wiring case (43 files / 289 tests green on an isolated run).@posthog/load-skill) and an MCP playbook (agent-resolve-resource) in one turn, and that the frozenspec.skills[].descriptionlands full rather than truncated.Regressions these tests catch: the freeze tests fail if the injection / sweep-exemption / collision-check are reverted; the registry test fails if a kernel description is folded or >280 chars — which would silently truncate the model's skill-load signal (the janitor derives the description from only the first frontmatter line).
Automatic notifications
Docs update
Internal in-tree docs only: the example-bundle README and a new
services/mcp/playbooks/README.mddescribing the three homes. No public docs affected.🤖 Agent context
Autonomy: Human-driven (agent-assisted)
Built with Claude Code (Opus 4.8). Skill invoked:
/qa-team(multi-agent review of the branch before this PR)./django-migrationsand/improving-drf-endpointswere loaded while working in the freeze view, though no migration was ultimately needed (the design is pure freeze logic + content).Design path worth flagging for review — the gate placement changed twice: an app-level
allow_kernel_skillsflag was rejected (every agent should be able to carry kernel content; the gate belongs on authoring, not the app), as was a privileged internal write endpoint (no inbound internal auth exists on the authoring views) and akernel_skillsauthor spec field (authors must not be able to author them). The landing point — server-side injection at freeze from backend code — keeps the store-only boundary intact for authors while letting the platform inject. The/qa-teampass then caught a real bug: the janitor derivesspec.skills[].descriptionfrom only the first frontmatter line, so the folded multi-line descriptions were silently truncated (the model's load signal forsafety-and-boundarieslost its trigger clause). Fixed with single-line ≤280 descriptions plus a load-time parity guard, and a CI-blocking un-regeneratedtool-inputs.jsonwas regenerated.🤖 Generated with Claude Code
https://claude.ai/code/session_01BY6C34DmGGmL4sgx5Wajn6