Add WebSkills support to demos and evals CLI by MiguelsPizza · Pull Request #49 · GoogleChromeLabs/webmcp-tools

MiguelsPizza · 2026-03-02T18:14:01Z

This PR adds WebSkills support across the demos and evals CLI. Pages declare skills via <script type="agent-context"> that agents can discover and progressively read.

All three demos (french-bistro, pizza-maker, react-flightsearch) embed <script type="agent-context"> skill blocks with inline reference documents
--skill flag on runevals loads a SKILL.md for benchmark comparison (baseline vs skill-augmented)
Multi-step agent loop handles read_site_context disclosure calls using agent-skills-ts-sdk
Extracted loadSkillFromFile() into evaluator/skill.ts for reuse
Deterministic eval date via WEBMCP_EVAL_DATE env var
Pizza-maker and travel skill eval test sets with SKILL.md files and reference docs
Updated README with --skill flag docs and benchmark mode

Extension submodule updated: beaufortfrancois/model-context-tool-inspector#9 (must merge first)

Eval results (gemini-2.5-flash, 2026-03-03)

Single-call skill evals (4 runs each):

Scenario	Baseline	With Skill	Delta
Pizza	87.5%	100.0%	+12.5 pts
Travel	54.2%	58.3%	+4.2 pts

Results have run-to-run stochasticity due to model variability.

Test plan

cd evals-cli && npm run build && npm test
Pizza skill benchmark: node dist/bin/runevals.js --tools=examples/pizza-maker/schema.json --evals=examples/pizza-maker/skill-evals.json --skill=examples/pizza-maker/skill/SKILL.md
Travel skill benchmark: node dist/bin/runevals.js --tools=examples/travel/schema.json --evals=examples/travel/skill-evals.json --skill=examples/travel/skill/SKILL.md
Verify demo HTML skill blocks are well-formed

google-cla · 2026-03-02T18:14:24Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

- All three demos embed <script type="agent-context"> skill blocks - --skill flag on runevals for benchmark comparison - Multi-step agent loop handles read_site_context disclosure calls - Extracted loadSkillFromFile() into evaluator/skill.ts - Deterministic eval date via WEBMCP_EVAL_DATE env var - Pizza-maker and travel skill eval test sets with SKILL.md files - Drop resource names from disclosure prompt entries - Type safety improvements (Record<string,unknown> over any)

MiguelsPizza mentioned this pull request Mar 2, 2026

Add WebSkills discovery and read_site_context support beaufortfrancois/model-context-tool-inspector#9

Draft

4 tasks

MiguelsPizza marked this pull request as draft March 2, 2026 18:19

MiguelsPizza force-pushed the webmcp-skills branch from 5a261ab to 9fe188a Compare March 3, 2026 00:10

MiguelsPizza force-pushed the webmcp-skills branch from fe3dd48 to 1200c11 Compare March 3, 2026 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WebSkills support to demos and evals CLI#49

Add WebSkills support to demos and evals CLI#49
MiguelsPizza wants to merge 1 commit intoGoogleChromeLabs:mainfrom
MiguelsPizza:webmcp-skills

MiguelsPizza commented Mar 2, 2026 •

edited

Loading

Uh oh!

google-cla bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MiguelsPizza commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Eval results (gemini-2.5-flash, 2026-03-03)

Test plan

Uh oh!

google-cla bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MiguelsPizza commented Mar 2, 2026 •

edited

Loading