DO NOT MERGE: Evaluate figma implementer subagent (WEB-2442) by Marcosld · Pull Request #1560 · Telefonica/mistica-web

Marcosld · 2026-06-03T09:41:48Z

Does using a specialiced subagent improve the `mistica-react` skill in any way?

A/B experiment — 6 headless Claude Code runs, identical prompt, same Figma design.
Model: claude-opus-4-8 (all runs) · n = 3 per arm.

TL;DR

	Verdict
Output quality (visual fidelity)	Tie. Both arms produce high-fidelity, near-pixel-faithful pages (~4.3–4.4 / 5). The subagent did not produce visibly better screens.
Mistica compliance	Tie (both near-perfect). Every run used Mistica primitives almost exclusively — 0 raw HTML / 0 inline styles in 4 of 6 runs. The skill is what enforces this, in both arms.
Total token volume	≈ Tie (−3%). Total tokens are dominated (~99.5%) by input-side context (mostly cache reads), which is near-equal across arms. The subagent does not consume dramatically fewer tokens overall, altough at first glance it seemed so because the main loop delegated work into the subagent.
Output tokens	−40% with subagent (27.5k vs 46k) — real and reproducible. But output is <1% of token volume; see decomposition below. This is probably caused by the system prompt asking the agent to be brief, whilst the main loop agent is usually verbose.
Orchestrator turns	−92% to −95%, but this is largely a measurement artifact: `num_turns` counts only the main loop. Total agentic work (deduped messages) is actually +22% higher in the subagent arm.
Cost / latency	Modest subagent edge: ~10% cheaper ($4.95 vs $5.51), ~14% faster (632s vs 735s). ~82% of the cost gap is attributable to the output-token difference. Speed improvement could be attributed to quicker MCP and skill loading speed as they are specified in the system prompt.

Conclusion: the subagent does not make the result better (the mistica-react skill does the heavy lifting in both arms) and does not do less total work or use dramatically fewer total tokens. Its two real, verified effects are: (1) it keeps the orchestrator transcript ~90% leaner by relocating the work into a child context (context hygiene), and (2) the subagent generates ~52% less text per turn than the verbose top-level agent, which yields a modest ~10% cost / ~14% latency edge at equal quality. Its value here is orchestration hygiene + lower output verbosity — not output fidelity, and not a large compute saving. We think it is not worth to use a subagent as part of the mistica-react plugin due to context loss (for example when iterating over a implementation) and almost non-existent improvements.

Methodology

Identical prompt (byte-for-byte) in all 6 runs:
Implement this design from Figma using Mistica @https://www.figma.com/design/puOwn8pBJCrMYksvXeCiJO/AI-Test---Figma-MCP-2-code?node-id=1-67393&m=dev
Isolation: 6 separate workspaces outside the main repo (so no parent .claude/agents leaks in). Each = copy of the project + .claude (skill symlink), node_modules symlinked to the main repo.
- with-1/2/3: .claude/agents/figma-mistica-implementer.md present.
- no-1/2/3: .claude/agents removed — skill only.
Delegation: natural auto-delegation (identical prompt; no nudge). Detected per run from the event stream.
Runs: headless claude -p --output-format stream-json --verbose --model opus. Metrics (duration_ms, num_turns, total_cost_usd, per-model token usage) read straight from the terminal result event.
Rendering: each implementation booted on a Vite dev server and screenshotted full-page at the design's native 1368px width via Playwright/Chromium; console + Vite-overlay errors captured.
Compliance: static analysis of generated .tsx (excluding the identical main.tsx boilerplate).

Sanity checks passed: all 6 runs exited 0; all loaded the mistica-react skill; all 3 with-* runs delegated to figma-mistica-implementer and no no-* run did; all 6 compiled and rendered with zero console/overlay errors.

Original design (baseline)

Per-run results

Run	Figma calls	Wall (s)	Output tok	Cache-read tok
with-1	15	527	29,556	6,861,084
with-2	12	718	28,194	5,467,758
with-3	15	651	24,794	5,485,159
no-1	15	713	46,727	4,858,799
no-2	16	727	43,302	6,710,654
no-3	14	767	48,185	6,741,214

We can see mean wall time is less in with runs. Nevertheless the minimum cache read tokens run is a no run, pointing to intra-group variability.

Token economics — decomposed and verified

This is the core of the "why fewer tokens?" question. Three findings:

1. Total token volume is ≈ equal (−3%). Tokens are ~99.5% input-side (context fed in each step, mostly cache reads) and ~0.5% output (text generated):

Run	Input-side	Output	Output % of all tokens
with-1 / 2 / 3	7.10M / 5.66M / 5.69M	29,556 / 28,194 / 24,794	0.41% / 0.50% / 0.43%
no-1 / 2 / 3	5.09M / 6.92M / 6.95M	46,727 / 43,302 / 48,185	0.91% / 0.62% / 0.69%

Both arms read the same Mistica docs + Figma payloads once and re-read their growing context a comparable number of times, so the bulk (cache reads) is near-equal. The subagent doesn't avoid that context — it just holds it in a child session instead of the parent (same volume, different container).

2. The −40% output-token gap is driven by verbosity-per-turn, not less work. Despite doing +22% more total messages, the with-* runs emit −40% output, because the subagent generates ~533 tokens/turn vs the top-level agent's ~1,116 (the two clusters don't overlap: max-with 539 < min-no 902). Causes: the subagent runs under a rigid 5-step workflow and its output is treated as a return value (terse, little narration), whereas the no-* run is the top-level conversational agent (more planning prose, running commentary, a long final user-facing summary).

3. Cost composition is dominated by cache reads; output is where the arms differ. Shares are exact (uniform 3.00× discount, CHECK 4):

Cost component	WITH share	NO share
Cache-read	59.9%	55.1%
Cache-write	23.7%	21.7%
Output	13.9%	21.0%
Input (uncached)	2.5%	2.3%

The mean output-token gap (18,557 tokens) is worth $0.46 at the realized output price — i.e. ~82% of the $0.56 total cost gap. The remaining ~$0.10 is cache-read run-to-run variance (note with-1 has the highest cache-read of all six → it cost $5.60 despite low output). So the ~10% cost edge is real but modest, and almost entirely an output-verbosity effect.

Mistica primitive compliance

Both arms are excellent and effectively tied — the skill enforces primitive usage regardless of the subagent:

Raw HTML elements: with-* = 0 / 0 / 0; no-* = 2 / 0 / 0 (only no-1 slipped in two <span>s).
Inline styles: with-* = 0 / 1 / 0; no-* = 2 / 0 / 0.
Hardcoded hex colors: 0 across all 6 runs.
Hardcoded px: only with-2 (6 occurrences); all others 0.
Every run composes from @telefonica/mistica components (MainNavigationBar, NavigationBreadcrumbs, Hero, Chip, GridLayout, MediaCard, Checkbox, RadioGroup, InfoRating, Text*, etc.) and pulls colors/spacing from skinVars tokens.

The subagent arm is marginally cleaner (0 raw HTML, more explicit skinVars usage), but the difference is within noise — both pass the "no raw divs / no raw styles" bar.

Visual fidelity

Scored 1–5 against the baseline screenshot (layout, hero, sidebar, grid, pagination, type/colour). All six are strong; the arms are statistically indistinguishable.

Run	Fidelity	Notes
with-1	4.5	All sections faithful; kept literal nav placeholders (matches the screenshot); hero slightly taller; ratings rendered as filled dots.
with-2	4.0	Cleanest engineering (split into `components/`), but customized nav to real labels ("Tienda/Móvil/…", "Lo quiero") — semantically nicer yet less literal vs the screenshot; hero smaller/top-right.
with-3	4.3	Faithful; ratings as real stars; good grid + pagination.
no-1	4.5	Near-identical to with-1; faithful across the board; ratings as dots.
no-2	4.2	Faithful; hero rendered as a wide 3-image panorama (aspect differs from baseline); ratings as stars.
no-3	4.5	Faithful; best hero match (green bg, phones + face); ratings as stars; full 1–5 pagination.
WITH avg	4.27
NO avg	4.40	(marginally higher — within subjective noise)

Screenshot gallery

WITH-subagent	NO-subagent
with-1	no-1
with-2	no-2
with-3	no-3

Interpretation

The skill is the quality driver, not the subagent. Both arms loaded mistica-react, and both produced near-pixel-faithful, primitive-compliant pages. Removing the subagent did not degrade output quality.
The subagent does NOT do less work or use dramatically fewer tokens. Total token volume is within 3%, and the subagent arm runs more total agentic messages (+22%). The headline "−95% turns" is an artifact of num_turns measuring only the orchestrator loop (verified, CHECK 2).
What the subagent actually changes is two-fold: (a) it keeps the orchestrator transcript ~90% leaner (work relocated into a child context), and (b) the subagent emits ~52% less text per turn than the verbose top-level agent. (b) is the entire source of the −40% output-token gap.
Cost is modest and output-driven. Cache reads (~55–60% of cost) dominate and are near-equal across arms; the ~10% cost edge is ~82% explained by the output-token gap (≈$0.46 of $0.56), the rest cache-read noise. Latency edge ~14%.
Architecture varied within both arms (monolithic App.tsx vs split component files) — run-to-run variance, not a subagent effect.

Recommendation

If the goal is better-looking / more compliant output, the subagent is not justified on this evidence — invest in the skill.
The subagent's defensible value is orchestration hygiene (a ~90% leaner main transcript) plus a modest ~10% cost / ~14% latency saving from lower output verbosity — not a large compute reduction. Worth keeping for batch/CI Figma→code where a clean orchestrator context and small per-job savings compound.
The leaner-orchestrator benefit should scale up on larger / multi-screen designs (where a bloated single context hurts more) — a worthwhile follow-up to test in the future, ideally with larger n to tighten the cost/latency estimates.

Marcosld · 2026-06-03T09:42:15Z

@@ -0,0 +1,128 @@
+---


This is the evaluated subagent

github-actions · 2026-06-03T09:44:26Z

Size stats

	master	this branch	diff
Total JS	16.2 MB	16.2 MB	0 B
JS without icons	2.07 MB	2.07 MB	0 B
Lib overhead	92.5 kB	92.5 kB	0 B
Lib overhead (gzip)	19.9 kB	19.9 kB	0 B

github-actions · 2026-06-03T09:47:53Z

Deploy preview for mistica-web ready!

Project:	`mistica-web`
Status:	✅ Deploy successful!
Preview URL:	https://mistica-athkv550q-flows-projects-65bb050e.vercel.app
Latest Commit:	`d7c20bf`

Deployed with vercel-action

github-actions · 2026-06-03T09:52:43Z

Accessibility report

❌ 55 problems detected

welcome--welcome [O2-new] (1 violations)

welcome--welcome [Movistar-new] (1 violations)

welcome--welcome [Vivo-new] (1 violations)

welcome--welcome [Blau] (1 violations)

components-accordions--boxed-accordion-story [Vivo-new] (1 violations)

components-accordions--boxed-accordion-story [Blau] (1 violations)

components-accordions--boxed-accordion-story [Movistar-new] (1 violations)

components-accordions--boxed-accordion-story [O2-new] (1 violations)

components-badge--default [Movistar-new] (1 violations)

components-badge--default [O2-new] (1 violations)

components-buttons--primary-button [Vivo-new] (1 violations)

components-buttons--danger-button [Movistar-new] (1 violations)

components-buttons--secondary-button [Blau] (1 violations)

components-buttons--danger-button [Vivo-new] (1 violations)

components-buttons--icon-button-story [Blau] (1 violations)

components-carousels-carousel--with-carousel-context-and-outside-controls [O2-new] (1 violations)

components-carousels-centeredcarousel--default [Vivo-new] (1 violations)

components-carousels-centeredcarousel--with-controls [Movistar-new] (1 violations)

components-carousels-centeredcarousel--with-controls [O2-new] (1 violations)

components-carousels-slideshow--with-carousel-context [O2-new] (1 violations)

components-checkbox--uncontrolled [Movistar-new] (1 violations)

components-checkbox--uncontrolled [O2-new] (1 violations)

components-checkbox--uncontrolled [Vivo-new] (1 violations)

components-chip--multiple-selection [Vivo-new] (1 violations)

components-headers-header--default [Movistar-new] (1 violations)

components-input-fields-autocomplete--controlled [O2-new] (1 violations)

components-input-fields-cvvfield--controlled [Blau] (1 violations)

components-input-fields-phonenumberfieldlite--uncontrolled [O2-new] (1 violations)

components-input-fields-searchfield--uncontrolled [O2-new] (1 violations)

components-input-fields-textfield--controlled [Movistar-new] (1 violations)

components-input-fields-searchfield--uncontrolled [Vivo-new] (1 violations)

components-modals-drawer--default [Movistar-new] (1 violations)

components-modals-drawer--default [O2-new] (1 violations)

components-popover--default [O2-new] (1 violations)

components-primitives-video--default [Blau] (1 violations)

components-progress-bars--progress-bar-story [Movistar-new] (1 violations)

components-radio-button--controlled [Vivo-new] (1 violations)

components-radio-button--uncontrolled [Vivo-new] (1 violations)

components-radio-button--uncontrolled [Blau] (1 violations)

components-radio-button--uncontrolled [O2-new] (1 violations)

components-switch--uncontrolled [Vivo-new] (1 violations)

components-text--text-wrapping [O2-new] (1 violations)

components-text--text-wrapping [Movistar-new] (1 violations)

components-timer--text-timer-story [Vivo-new] (1 violations)

patterns-loading--brand-loading-screen-story [Blau] (1 violations)

layout-align--default [Movistar-new] (1 violations)

layout-inline--wrap [O2-new] (1 violations)

community-advanceddatacard--default [Movistar-new] (1 violations)

private-components-inside-portals--default [Vivo-new] (1 violations)

private-components-inside-portals--default [O2-new] (1 violations)

private-deprecated-card-stories-nakedcard--default [Vivo-new] (1 violations)

private-fixedfooter--default [Blau] (1 violations)

private-image-image-sizes--default [Blau] (1 violations)

private-tooltip--moving-target [Vivo-new] (1 violations)

private-tooltip--moving-target [Blau] (1 violations)

ℹ️ You can run this locally by executing yarn audit-accessibility.

Copilot

Pull request overview

Adds a dedicated “figma-mistica-implementer” agent definition intended to delegate Figma→Mistica React implementation work into a specialized subagent context.

Changes:

Introduces a new agent prompt/spec for implementing Figma designs with @telefonica/mistica.
Defines a required 5-step workflow (load skill → extract Figma → map → implement → verify) plus output requirements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+---
+name: 'figma-mistica-implementer'
+description:
+  "Use this agent when you need to translate a Figma design into production-ready React code using the
+  @telefonica/mistica component library. This agent should be invoked whenever a user provides a Figma design


+name: 'figma-mistica-implementer'
+description:
+  "Use this agent when you need to translate a Figma design into production-ready React code using the
+  @telefonica/mistica component library. This agent should be invoked whenever a user provides a Figma design
+  URL and wants it implemented as code, or when a design needs to be converted into Mistica-compliant
+  components. <example>\\nContext: The user wants to implement a Figma design into code using Mistica.\\nuser:
+  \"Here's the design for the new login screen: https://figma.com/file/abc123/login-screen. Can you implement
+  it?\"\\nassistant: \"I'm going to use the Agent tool to launch the figma-mistica-implementer agent to
+  translate this Figma design into Mistica-compliant React code.\"\\n<commentary>\\nSince the user provided a
+  Figma URL and wants it implemented, use the figma-mistica-implementer agent to extract the design via Figma
+  MCP and build it with @telefonica/mistica.\\n</commentary>\\n</example>\\n<example>\\nContext: The user
+  shares a Figma frame and asks for a component.\\nuser: \"Build this card component from Figma using our
+  design system: https://figma.com/file/xyz789/card\"\\nassistant: \"Let me use the Agent tool to launch the
+  figma-mistica-implementer agent to build a visually accurate, Mistica-compliant implementation of this
+  card.\"\\n<commentary>\\nThe user wants a Figma design implemented with the Telefonica design system, so the
+  figma-mistica-implementer agent is the right choice.\\n</commentary>\\n</example>\\n<example>\\nContext: The
+  user pastes a Figma node link mid-conversation while building a feature.\\nuser: \"Now add the settings
+  panel — here's the design: https://figma.com/file/def456/settings?node-id=12-345\"\\nassistant: \"I'll use
+  the Agent tool to launch the figma-mistica-implementer agent to implement the settings panel from this Figma
+  node using Mistica.\"\\n<commentary>\\nA Figma design link was provided for implementation; delegate to the
+  figma-mistica-implementer agent.\\n</commentary>\\n</example>"


Evaluate verifier subagent

d7c20bf

Marcosld commented Jun 3, 2026

View reviewed changes

Comment thread figma-mistica-implementer.md

@@ -0,0 +1,128 @@

---

Marcosld Jun 3, 2026

Copy link
Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the evaluated subagent

Marcosld requested review from aweell, ieduardogf and yceballost June 3, 2026 10:00

Marcosld marked this pull request as ready for review June 3, 2026 10:01

Copilot AI review requested due to automatic review settings June 3, 2026 10:01

Copilot started reviewing on behalf of Marcosld June 3, 2026 10:01 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE: Evaluate figma implementer subagent (WEB-2442)#1560

DO NOT MERGE: Evaluate figma implementer subagent (WEB-2442)#1560
Marcosld wants to merge 1 commit into
masterfrom
WEB-2442

Marcosld commented Jun 3, 2026 •

edited

Loading

Uh oh!

Marcosld Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Marcosld commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Does using a specialiced subagent improve the mistica-react skill in any way?

TL;DR

Methodology

Original design (baseline)

Per-run results

Token economics — decomposed and verified

Mistica primitive compliance

Visual fidelity

Screenshot gallery

Interpretation

Recommendation

Uh oh!

Marcosld Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Marcosld commented Jun 3, 2026 •

edited

Loading

Does using a specialiced subagent improve the `mistica-react` skill in any way?