Context
Agent evals on gpt-5.5 after the work in #225 (plugin-format skills, mur pack-local, dotnet new reactorapp) put reactor-calc at $3.30 / 226 s / 393 K tokens / 11 turns (5-mean), with first build success 5/5 and CV down from 40 % → 8 %. The Reactor / WinUI gap closed to 1.00× tokens, 0.62× wall on calc and 1.20× tokens, 0.72× wall on kanban — Reactor is now faster than XAML for equal cost.
The remaining ceiling is HTML at $1.68 / 88 s for calc and $1.98 / 170 s for kanban. The Reactor → HTML gap is ~3× tokens, ~2.5× wall.
| eval |
wall (CV) |
turns (CV) |
tokens (CV) |
cost USD |
LoC |
first build |
| html-calc |
88 s (33 %) |
5.6 |
131 K |
$1.68 |
305 |
n/a |
| html-kanban |
170 s (27 %) |
6.6 |
183 K |
$1.98 |
703 |
n/a |
| reactor-calc |
226 s (8 %) |
11.0 |
393 K |
$3.30 |
163 |
5/5 ✓ |
| reactor-kanban |
471 s (21 %) |
16.8 |
738 K |
$5.04 |
277 |
5/5 ✓ |
| winui-xaml-calc |
363 s (62 %) |
11.2 |
394 K |
$3.30 |
334 |
n/a |
| winui-xaml-kanban |
651 s (31 %) |
13.6 |
615 K |
$4.56 |
590 |
n/a |
The structural overhead in Reactor is ~270 K tokens per kanban run vs HTML's ~130 K. That's the remaining gap.
Where the gap actually lives (event-log evidence)
A typical Reactor-kanban run breaks down as:
| category |
turns |
tokens (est.) |
| skill load |
1 |
5 K |
dotnet new + scaffold inspect |
2-3 |
25 K |
| Sample-app reads (drag, dialog, flyout, context) |
3-5 |
80 K |
reactor.api.txt ripgreps |
4-8 |
60 K |
| Apply-patch implementation |
3-4 |
80 K |
| Build + fix cycles |
2-4 |
150 K |
| total |
~17 |
~400 K |
HTML's equivalent: ~6 turns, ~130 K tokens, no scaffold, no API exploration, no build cycles.
Proposals (ranked by predicted impact × feasibility)
[ ] 1. Bigger, richer dotnet new reactorapp template — predicted ~−25 % tokens
Today the template generates a 12-line counter. Make it a multi-component app with the shapes the agent needs:
App root with UseReducer + a typed record state
- A
Component<TProps> child
- A
.Provide(Ctx, ...) example
// REPLACE WITH YOUR LOGIC comments at all the right places
Rationale: The agent reads the scaffold once (cheap, single `view`) and gets all structural patterns in-workspace. WinUI rarely loads its design skill body because `dotnet new winui-mvvm`'s 30+ files of MVVM scaffolding is the documentation. Net est: save ~3-4 turns × ~30 K context = ~120 K tokens per kanban run.
First step: Edit `tools/Templates/templates/WinUIApp-CSharp/` to add a `Components/` directory, a `Models.cs` with a record state, and inline-comment guidance.
[ ] 2. Inline a generated cheatsheet into the workspace at scaffold time — predicted −10-15 % tokens
When `dotnet new reactorapp` runs, drop `_reactor-api-cheatsheet.md` (top 80% factories/modifiers/hooks, ~100 lines) alongside `Program.cs`. Auto-include in the csproj as `` so build ignores it.
Rationale: Cost-of-context per turn is the dominant lever. Moving signatures from "ripgrep'd 6-8× per session" to "one tiny file read at scaffold time" is a 2-3× compression of that overhead. Generate from the same source as `reactor.api.txt` to avoid drift.
First step: Add cheatsheet emission to `tools/Reactor.SignaturesGen/Program.cs` (already writes `reactor.api.txt` to two paths; add a third for the cheatsheet) + reference from the template.
[ ] 5. Make mur check the default verification, not dotnet build — predicted −10-20 % tokens
Currently the agent runs `dotnet build`, which dumps 1.5-3 K tokens of MSBuild output per build. `mur check` returns ~50-150 tokens with skill-file pointers. With 2-4 build cycles per session at 17 turns, ~150 K cache reads saved per run (9 K saved per turn × 17 turns).
Two-tier: `mur check` first, `dotnet build` only if `check` is clean but a deeper compile error is suspected.
First step: Update the eval prompt (`evals/lib/flavor-reactor.ts`) and the `reactor-build-and-check` skill to lead with `mur check`.
Recommended cut: implement #1 + #2 + #5 together — independent and complementary. Predicted cumulative: kanban tokens 738 K → ~480 K, cost $5.04 → $3.30. Even at 2× pessimism, ~600 K is reachable — putting Reactor solidly under WinUI's 615 K for the first time.
[ ] 3. Defer-everything skill loading — predicted −5-10 % tokens, +5 % build-failure risk
Drop the always-loaded `reactor-getting-started` body. Keep only a 200-token stub that points the agent at `skill reactor-getting-started` on demand. Cache reads are paid every turn; a 5 K-token always-loaded skill costs 85 K cache reads per kanban run.
Risk: First-build success rate may regress if the agent guesses API names without skill reference. Don't ship if first-build OK rate drops below 90 % — A/B 5×N batch first.
[ ] 4. "Generate then port" — skill-directed two-pass authoring — medium win, +1-2 turns of cheaper turns
Skill text directs the agent: for any non-trivial UI, sketch the component tree in JSX-like pseudocode in your head, then translate component-by-component to Reactor C# (1 line of pseudocode → 1 line of Reactor). The Rosetta-stone table is already there; emphasize the pseudocode-first workflow so React priors carry the design phase.
First step: Add to `reactor-getting-started`: a "## Authoring workflow — design in React, write in C#" section with one worked 5-line React → 8-line Reactor example.
[ ] 6. C# / DSL ergonomics — small win on output tokens; longer-term lever
a. Components-as-records: `record App() : Component { override Render() => ... }` — saves ~15 chars per component, marginal but compounds.
b. Verify implicit `using static Microsoft.UI.Reactor.Factories` is in the template's `GlobalUsings.cs` and document it in the skill (so the agent doesn't redundantly add it). Documentation only — do this now.
c. `UseState`-as-property syntax (long-term framework work): `[Stateful] partial class App { State Count = 0; }` — needs source generator + analyzer support. Defer.
Honest ceiling
Hard floor is set by: one build cycle (~50-80 K tokens HTML doesn't pay), some skill content load on first attempt (~30-60 K one-time), occasional ripgrep of less-common patterns (~10-30 K). Lower-bound estimate with all 6 ideas: kanban ~250-300 K tokens (vs HTML's 183 K). ~1.5× HTML is about as close as we can realistically land while keeping correctness checks.
Why "generate HTML, then port" doesn't pencil out: HTML pass + port pass ≈ 380 K tokens / $5.50 / 340 s — tokens improve, cost ties, and you carry porting-fidelity risk. The mental version of this (idea #4) captures most of the benefit without the second pass.
Aim: match WinUI XAML on cost (done — at 1.00× / 1.11×), be 1.5×–2× HTML as the realistic ceiling.
Pointers
Context
Agent evals on
gpt-5.5after the work in #225 (plugin-format skills,mur pack-local,dotnet new reactorapp) putreactor-calcat $3.30 / 226 s / 393 K tokens / 11 turns (5-mean), with first build success 5/5 and CV down from 40 % → 8 %. The Reactor / WinUI gap closed to 1.00× tokens, 0.62× wall on calc and 1.20× tokens, 0.72× wall on kanban — Reactor is now faster than XAML for equal cost.The remaining ceiling is HTML at $1.68 / 88 s for calc and $1.98 / 170 s for kanban. The Reactor → HTML gap is ~3× tokens, ~2.5× wall.
The structural overhead in Reactor is ~270 K tokens per kanban run vs HTML's ~130 K. That's the remaining gap.
Where the gap actually lives (event-log evidence)
A typical Reactor-kanban run breaks down as:
dotnet new+ scaffold inspectreactor.api.txtripgrepsHTML's equivalent: ~6 turns, ~130 K tokens, no scaffold, no API exploration, no build cycles.
Proposals (ranked by predicted impact × feasibility)
[ ] 1. Bigger, richer
dotnet new reactorapptemplate — predicted ~−25 % tokensToday the template generates a 12-line counter. Make it a multi-component app with the shapes the agent needs:
Approot withUseReducer+ a typedrecordstateComponent<TProps>child.Provide(Ctx, ...)example// REPLACE WITH YOUR LOGICcomments at all the right placesRationale: The agent reads the scaffold once (cheap, single `view`) and gets all structural patterns in-workspace. WinUI rarely loads its design skill body because `dotnet new winui-mvvm`'s 30+ files of MVVM scaffolding is the documentation. Net est: save ~3-4 turns × ~30 K context = ~120 K tokens per kanban run.
First step: Edit `tools/Templates/templates/WinUIApp-CSharp/` to add a `Components/` directory, a `Models.cs` with a record state, and inline-comment guidance.
[ ] 2. Inline a generated cheatsheet into the workspace at scaffold time — predicted −10-15 % tokens
When `dotnet new reactorapp` runs, drop `_reactor-api-cheatsheet.md` (top 80% factories/modifiers/hooks, ~100 lines) alongside `Program.cs`. Auto-include in the csproj as `` so build ignores it.
Rationale: Cost-of-context per turn is the dominant lever. Moving signatures from "ripgrep'd 6-8× per session" to "one tiny file read at scaffold time" is a 2-3× compression of that overhead. Generate from the same source as `reactor.api.txt` to avoid drift.
First step: Add cheatsheet emission to `tools/Reactor.SignaturesGen/Program.cs` (already writes `reactor.api.txt` to two paths; add a third for the cheatsheet) + reference from the template.
[ ] 5. Make
mur checkthe default verification, notdotnet build— predicted −10-20 % tokensCurrently the agent runs `dotnet build`, which dumps 1.5-3 K tokens of MSBuild output per build. `mur check` returns ~50-150 tokens with skill-file pointers. With 2-4 build cycles per session at 17 turns, ~150 K cache reads saved per run (9 K saved per turn × 17 turns).
Two-tier: `mur check` first, `dotnet build` only if `check` is clean but a deeper compile error is suspected.
First step: Update the eval prompt (`evals/lib/flavor-reactor.ts`) and the `reactor-build-and-check` skill to lead with `mur check`.
[ ] 3. Defer-everything skill loading — predicted −5-10 % tokens, +5 % build-failure risk
Drop the always-loaded `reactor-getting-started` body. Keep only a 200-token stub that points the agent at `skill reactor-getting-started` on demand. Cache reads are paid every turn; a 5 K-token always-loaded skill costs 85 K cache reads per kanban run.
Risk: First-build success rate may regress if the agent guesses API names without skill reference. Don't ship if first-build OK rate drops below 90 % — A/B 5×N batch first.
[ ] 4. "Generate then port" — skill-directed two-pass authoring — medium win, +1-2 turns of cheaper turns
Skill text directs the agent: for any non-trivial UI, sketch the component tree in JSX-like pseudocode in your head, then translate component-by-component to Reactor C# (1 line of pseudocode → 1 line of Reactor). The Rosetta-stone table is already there; emphasize the pseudocode-first workflow so React priors carry the design phase.
First step: Add to `reactor-getting-started`: a "## Authoring workflow — design in React, write in C#" section with one worked 5-line React → 8-line Reactor example.
[ ] 6. C# / DSL ergonomics — small win on output tokens; longer-term lever
a. Components-as-records: `record App() : Component { override Render() => ... }` — saves ~15 chars per component, marginal but compounds.
b. Verify implicit `using static Microsoft.UI.Reactor.Factories` is in the template's `GlobalUsings.cs` and document it in the skill (so the agent doesn't redundantly add it). Documentation only — do this now.
c. `UseState`-as-property syntax (long-term framework work): `[Stateful] partial class App { State Count = 0; }` — needs source generator + analyzer support. Defer.
Honest ceiling
Hard floor is set by: one build cycle (~50-80 K tokens HTML doesn't pay), some skill content load on first attempt (~30-60 K one-time), occasional ripgrep of less-common patterns (~10-30 K). Lower-bound estimate with all 6 ideas: kanban ~250-300 K tokens (vs HTML's 183 K). ~1.5× HTML is about as close as we can realistically land while keeping correctness checks.
Why "generate HTML, then port" doesn't pencil out: HTML pass + port pass ≈ 380 K tokens / $5.50 / 340 s — tokens improve, cost ties, and you carry porting-fidelity risk. The mental version of this (idea #4) captures most of the benefit without the second pass.
Aim: match WinUI XAML on cost (done — at 1.00× / 1.11×), be 1.5×–2× HTML as the realistic ceiling.
Pointers