DO NOT MERGE - Full Context Only - feat(skills): Add page-structure skill with eval-driven development#16
Draft
yan-xie-webflow wants to merge 13 commits intomainfrom
Draft
DO NOT MERGE - Full Context Only - feat(skills): Add page-structure skill with eval-driven development#16yan-xie-webflow wants to merge 13 commits intomainfrom
yan-xie-webflow wants to merge 13 commits intomainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tract_skill_invocations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TDD tests covering execution quality (tool ordering, correct tool calls) and safety (confirmation before mutations, no hallucinated tools). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TDD tests covering 14 positive triggers (page/element/component manipulation prompts) and 15 negative triggers (CMS, publish, audit, and other non-structure prompts). Tests will fail until the page-structure skill is implemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Important Note section with all MCP tool declarations (matching existing pattern) - 5-phase workflow: Discovery, Inspection, Planning, Execution, Verification - Safety: snapshot before mutation, explicit confirmation required - 5 examples covering list elements, build, update component, restructure, layout - Fixed plugin config: moved to .claude-plugin/plugin.json, added skills field
…ger prompt - extract_skill_invocations now normalizes both short and full skill names - Moved 'What components does my site have?' to ambiguous cases (site-audit wins) - Replaced with unambiguous 'List the components I can use on this page' - Reverted description bloat
…nd relaxed negative tests Update page-structure skill description with more specific verbs (inspecting, viewing, creating pages, previewing) to improve trigger reliability. Loosen two negative test assertions where alternative skills (brainstorming, frontend-design) legitimately intercept prompts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rect tests
Add data_sites_tool to Discovery phase and all examples in SKILL.md,
matching the pattern used by all other skills. Update direct test prompts
to specify site name ("Yan's Test Case") to avoid ambiguity with 50+ sites
in workspace. Relax mutation test assertions to accept confirmation-and-stop
behavior in non-interactive mode.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rketplace configs Add the new plugin to .claude-plugin/marketplace.json and .cursor-plugin/plugin.json so it's discoverable in both Claude Code and Cursor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The skill covers more than just page structure (components, styles, snapshots, element building), so rename test files to match the broader plugin scope. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename skill folder, frontmatter, test references, and class names from page-structure to designer-tools to reflect the broader scope (pages, elements, components, styles, snapshots). Update Cursor marketplace description accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
webflow-designer-toolswithpage-structureskill for building and managing page elements, components, and layouts in Webflow DesignerMethodology: Eval-Driven Skill Development
This PR follows an eval-first approach — all tests were written and validated before the skill implementation, similar to TDD but for LLM skill behavior:
run_claude,extract_tool_calls,extract_skill_invocations) that spawnclaude -pwith--output-format stream-jsonand parse events to verify tool calls and skill invocationsSKILL.mdfollowing existing patterns (phased workflow, tool declarations, examples)Key design decisions driven by evals
data_sites_tool→ model asks for site ID and stops-pmodepage-structure) vs full (webflow-designer-tools:page-structure)extract_skill_invocationsnormalizes both forms using init event's skills listTest Results
Files
Eval framework (
evals/):pytest.ini— config with custom markers (designer, data_api, trigger, direct, negative)constants.py— central config (MCP tools, plugin dirs, model, max turns)conftest.py—run_claude(),extract_tool_calls(),extract_skill_invocations(),get_result()test_conftest_smoke.py— 5 smoke teststest_page_structure_trigger.py— 29 trigger accuracy teststest_page_structure_direct.py— 15 execution quality testsSkill (
plugins/webflow-designer-tools/):.claude-plugin/plugin.json— plugin configskills/page-structure/SKILL.md— 5-phase workflow (Discovery → Inspection → Planning → Execution → Verification)Config:
.claude-plugin/marketplace.json— added webflow-designer-tools entry.cursor-plugin/plugin.json— added skills path.cursor-plugin/marketplace.json— updated descriptionTest plan
pytest evals/ -v)/page-structure List all elements on the current page🤖 Generated with Claude Code