docs: Add UI performance optimizations design document#801
docs: Add UI performance optimizations design document#801swaroopvarma1 wants to merge 1 commit into
Conversation
Survey of vercel-labs/json-render, google/A2UI v0.9+v0.10, tambo-ai/tambo, constrained decoding (XGrammar/OpenAI strict), and prompt caching. Maps each external technique to a tiered set of optimisations for our SpecStream pipeline — from free wins (catalog lru_cache, Anthropic cache_control, validate_props short-circuit) through wire compaction and per-prop streaming, up to the structural state+elements split that the modern frameworks all converge on. https://claude.ai/code/session_01PXYLybfLwxL3jrR3mPvoY9
WalkthroughA new documentation file outlines a draft optimization plan to reduce Buddy Widget UI generation latency through tiered improvements: token economy, per-prop streaming, constrained generation, and caching strategies. The plan includes a phased rollout sequence, measurement metrics for TTFT/latency/token efficiency, and explicit guarantees that the generic SpecStream contract remains intact. ChangesUI Performance Optimizations RFC
Estimated Code Review Effort🎯 1 (Trivial) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (5)
docs/widget/UI_PERF_OPTIMIZATIONS.md (5)
37-42: 💤 Low valueAdd language specifier to code block.
The fenced code block starting at line 37 should specify a language (likely
jsonlorjson) for proper syntax highlighting:-``` +```jsonl {"op":"add","path":"/root","value":"dashboard"}As per coding guidelines, fenced code blocks should have a language specified.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 37 - 42, The fenced code block containing the JSON Patch lines (the objects with "op":"add", "path":"/root", etc.) needs a language specifier for syntax highlighting; update the block that currently starts with ``` to ```jsonl (or ```json if you prefer) so the snippet is rendered with proper JSON/JSONL highlighting—locate the block containing the JSON Patch entries and replace the opening backticks with ```jsonl.
224-227: 💤 Low valueVerify effort estimate for D1 caching implementation.
The "30 minutes" effort estimate for adding
@lru_cacheseems optimistic. While adding the decorator is trivial, properly validating that:
- The
frozensetkeying correctly handles all template allowlist variations- Cache hit rates are acceptable in production
- Cache invalidation works correctly when templates change
- Memory usage is bounded appropriately
...typically requires more than 30 minutes of work, including testing and validation.
Consider revising to "2-4 hours" to account for proper testing and validation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 224 - 227, Update the effort estimate for the D1 caching change from "30 minutes" to "2-4 hours" and annotate that this time covers adding and verifying the `@lru_cache` usage, validating the frozenset keying for all template allowlist permutations, measuring cache hit rates in production-like scenarios, implementing and testing cache invalidation when templates change, and confirming memory usage bounds; reference the D1 caching change, `@lru_cache` decorator, frozenset keying, and template allowlist in the note so reviewers know what validation tasks are expected.
275-288: ⚡ Quick winClarify cumulative latency improvements and phase dependencies.
The "Latency win" column lists per-phase improvements, but it's unclear whether these are:
- Additive (Phase 2's 30-40% reduction + Phase 3's TTFR improvement = ~X total)
- Independent (each measured against baseline)
- Multiplicative (Phase 4's 5-10× reduction applies to the output remaining after Phase 2's 30-40% reduction)
Consider adding a "Cumulative TTFT target" column showing the expected total latency after each phase completes. For example:
- Baseline: ~3s TTFT
- After Phase 0: ~2.6s (-400ms)
- After Phase 2: ~1.8s (additional 30% reduction on output tokens)
- After Phase 4: ~600ms (5-10× reduction on lists)
Also, explicitly note dependencies: Phase 5 (A2) is listed as "stacked on phase 4," but what about Phase 3's widget-side changes? Does Phase 4 depend on Phase 3, or are they independent?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 275 - 288, The table's "Latency win" entries are ambiguous about whether improvements are additive, independent, or multiplicative; update the rollout table to add a "Cumulative TTFT target" column that computes expected total latency after each phase (use the Baseline ~3s TTFT example and show numeric outcomes after Phase 0, Phase 1, Phase 2, Phase 3, Phase 4, etc.), and explicitly annotate phase dependencies (e.g., mark Phase 5/A2 as "requires Phase 4", specify whether Phase 4 depends on Phase 3's widget-side changes or is independent, and note which phases compound vs. measured against baseline). Ensure you update the Phase rows (Phase 0..6) and add short dependency flags like "stacked on", "independent", or "requires" next to unique identifiers such as D1, D2, D3, A1, A2, A3, B1, B2, C1, C2 so readers can see cumulative effects and prerequisite relationships.
136-140: ⚡ Quick winQuantify the token reduction claim with concrete examples.
The claim "5-10× output token reduction on list-rendering turns" is compelling but needs supporting calculation. Consider adding a worked example showing:
- Current: 8 full Tile ops with token count
- Proposed: 1 template + 1 data array with token count
- Actual reduction ratio
This would strengthen the case for prioritizing A1 and help validate the effort estimate.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 136 - 140, Add a short worked example that quantifies the "5-10× output token reduction" claim by showing concrete token counts for both formats: (1) current approach: 8 full Tile ops (show per-Tile token count and total), and (2) proposed approach: 1 template op + 1 data array (show template token count, data-array token count and total), then compute the reduction ratio; place this example in the "Massive optimisation" paragraph that discusses emitting the `set_data` op server-side and reference the carousel / list-rendering scenario and the `set_data`/`$item`/`repeat` symbols so readers can see the assumptions behind the calculation.
124-132: 💤 Low valueAdd language specifier to code block.
The fenced code block starting at line 124 should specify a language (likely
jsonlorjson):-``` +```jsonl <ui_stream> {"op":"add","id":"root","type":"Carousel"}As per coding guidelines, fenced code blocks should have a language specified.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 124 - 132, The fenced code block that begins with the <ui_stream> operations is missing a language specifier; update the opening triple-backtick for that block to include a language (e.g., "jsonl" or "json") so the block reads ```jsonl (or ```json) before the <ui_stream> line to satisfy the coding guideline for specifying fenced-code languages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/widget/UI_PERF_OPTIMIZATIONS.md`:
- Around line 308-320: Add explicit contract tests to the rollout measurement
plan (Section 5) that validate the fallback behaviors described in Section 6:
create tests for A1 ensuring the LLM can still emit flat-op form for
heterogeneous lists and one-off UI elements; for A2 ensure speculative emission
is overridden by LLM final ops including edge cases like identical id/different
type and partial property overrides; for C1/C2 validate constrained generation
schemas permit every case allowed by the existing healer (no false negatives);
and for D4 add a short-circuit validation test that confirms background Pydantic
validation still catches all issues. Include these as a "Contract Tests"
subsection and reference the specific guarantees (A1, A2, C1, C2, D4) so they
run in CI during each phase rollout.
- Around line 143-148: Update the speculative UI emission proposal to mitigate
jarring UX by: add a rate-limit/suppression threshold (e.g. skip speculative
emit if refinement < ~200ms) around the emission logic that currently uses
ToolUiHint.examples[0]; surface a clear visual indicator/placeholder state for
speculative content so users know it's a refinement in progress; implement
interaction guards that debounce or disable actions on speculative elements and
queue user actions to be re-applied after the LLM <ui_stream> arrives; ensure
the widget op-merge semantics for ids follow the described replace/add/remove
behavior and tag emitted ops with a "speculative" flag so analytics can compute
a refinement delta metric (how often final <ui_stream> changes ids or values).
- Around line 293-304: Add explicit baseline measurements and new user-perceived
and cache-effectiveness metrics to the Measurement plan: record p50/p95/p99
baselines for each existing metric (ttft_ms, ttfui_ms, ttlui_ms,
llm_output_tokens, cache_read_input_tokens, ui_op_dropped) before Phase 0, and
add speculative_refinement_delta_ms, speculative_change_rate, and
layout_shift_score to capture user-perceived impact of speculative emits; also
add cache_hit_rate and cache_miss_rate alongside cache_read_input_tokens for
relative cache effectiveness, and expand the SSE tagging proposal to include
per-request flags for which specific optimizations (e.g., D1, D2, D3, D4) are
active in addition to the overall optimisation phase tag.
- Around line 194-202: Summarize and mitigate vendor lock-in for the C1
"render_ui" tool-call approach: assess parity of tool-call streaming semantics
across providers (Anthropic partial_json vs OpenAI/Gemini), list required
fallback behavior for the existing <ui_stream> marker path, and estimate added
maintenance/migration cost if Anthropic changes; update docs in
UI_PERF_OPTIMIZATIONS.md (section C1) to either (a) define a clear provider
abstraction strategy with interfaces/feature-detection to switch between
render_ui tool-call and <ui_stream> marker handling, or (b) explicitly mark C1
as Anthropic-only with rationale and ongoing maintenance estimate, and adjust
the "Effort: 1 week" estimate accordingly.
- Around line 13-27: Update the UI_PERF_OPTIMIZATIONS doc to correct the
inaccuracies: mention that template/builder.py calls _splice_ui_primitives(...)
at the noted location and mcp/__init__.py:_maybe_inject_ui_instructions injects
_ui_instructions/_ui_examples (already present), change the UiStreamExtractor
carry buffer description to reference _CARRY_MAX = max(len("<ui_stream>"),
len("</ui_stream>")) - 1 instead of a fixed "16-char" value, clarify that
validate_props (Pydantic) is applied to "add" ops while "replace" only undergoes
weaker checks in parse_op_line with stronger validation deferred to the widget,
and make the cost table estimates explicit by labeling timings as approximate
(or add profiling methodology/conditions and a "last verified" timestamp) rather
than asserting exact seconds.
---
Nitpick comments:
In `@docs/widget/UI_PERF_OPTIMIZATIONS.md`:
- Around line 37-42: The fenced code block containing the JSON Patch lines (the
objects with "op":"add", "path":"/root", etc.) needs a language specifier for
syntax highlighting; update the block that currently starts with ``` to ```jsonl
(or ```json if you prefer) so the snippet is rendered with proper JSON/JSONL
highlighting—locate the block containing the JSON Patch entries and replace the
opening backticks with ```jsonl.
- Around line 224-227: Update the effort estimate for the D1 caching change from
"30 minutes" to "2-4 hours" and annotate that this time covers adding and
verifying the `@lru_cache` usage, validating the frozenset keying for all template
allowlist permutations, measuring cache hit rates in production-like scenarios,
implementing and testing cache invalidation when templates change, and
confirming memory usage bounds; reference the D1 caching change, `@lru_cache`
decorator, frozenset keying, and template allowlist in the note so reviewers
know what validation tasks are expected.
- Around line 275-288: The table's "Latency win" entries are ambiguous about
whether improvements are additive, independent, or multiplicative; update the
rollout table to add a "Cumulative TTFT target" column that computes expected
total latency after each phase (use the Baseline ~3s TTFT example and show
numeric outcomes after Phase 0, Phase 1, Phase 2, Phase 3, Phase 4, etc.), and
explicitly annotate phase dependencies (e.g., mark Phase 5/A2 as "requires Phase
4", specify whether Phase 4 depends on Phase 3's widget-side changes or is
independent, and note which phases compound vs. measured against baseline).
Ensure you update the Phase rows (Phase 0..6) and add short dependency flags
like "stacked on", "independent", or "requires" next to unique identifiers such
as D1, D2, D3, A1, A2, A3, B1, B2, C1, C2 so readers can see cumulative effects
and prerequisite relationships.
- Around line 136-140: Add a short worked example that quantifies the "5-10×
output token reduction" claim by showing concrete token counts for both formats:
(1) current approach: 8 full Tile ops (show per-Tile token count and total), and
(2) proposed approach: 1 template op + 1 data array (show template token count,
data-array token count and total), then compute the reduction ratio; place this
example in the "Massive optimisation" paragraph that discusses emitting the
`set_data` op server-side and reference the carousel / list-rendering scenario
and the `set_data`/`$item`/`repeat` symbols so readers can see the assumptions
behind the calculation.
- Around line 124-132: The fenced code block that begins with the <ui_stream>
operations is missing a language specifier; update the opening triple-backtick
for that block to include a language (e.g., "jsonl" or "json") so the block
reads ```jsonl (or ```json) before the <ui_stream> line to satisfy the coding
guideline for specifying fenced-code languages.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 72f828d3-bf19-4813-918a-ac85bbf4ac2e
📒 Files selected for processing (1)
docs/widget/UI_PERF_OPTIMIZATIONS.md
| 1. `template/builder.py:351` splices the rendered `## Available primitives` section (`ui_prompt.py:render_primitives_section`) into the system prompt. Built every turn, no cache. | ||
| 2. Tool results are mutated post-hoc in `mcp/__init__.py:_maybe_inject_ui_instructions` — `_ui_instructions` + `_ui_examples` keys spliced into the JSON envelope (JIT pattern). | ||
| 3. LLM streams text. Inside `<ui_stream>…</ui_stream>` markers it emits one JSON op per line: `{"op":"add","id":"p1","type":"Tile","parent":"root","props":{...}}`. | ||
| 4. `chat/ui_stream.py:UiStreamExtractor` is a stateful FSM with a 16-char carry buffer. Each complete line → `heal_op_line` (deterministic rule pass) → `parse_op_line` → `validate_props` (full Pydantic) → `ui_op` SSE event to the widget. | ||
| 5. Widget applies ops to a session-stateful tree and renders. | ||
|
|
||
| **Where the seconds actually go (ranked):** | ||
|
|
||
| | Source | Cost | Mitigable? | | ||
| |---|---|---| | ||
| | LLM token throughput on verbose Tile JSON | **~2-4s for a typical carousel of 8 tiles** | Yes — biggest lever | | ||
| | `_ui_examples` re-injected per tool call, never cached | 1-10KB tokens per turn | Yes | | ||
| | Primitives section rebuilt per turn | Tokens + ~5-20ms CPU | Yes — trivial | | ||
| | Per-line full Pydantic validation | 1-5ms × N lines | Yes — short-circuit | | ||
| | No prompt-cache markers | Re-tokenised every turn | Yes — provider feature | |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify that the referenced implementation files and symbols exist
echo "=== Checking referenced files ==="
fd -t f "builder.py|ui_stream.py|ui_healer.py|agent.py|ui_prompt.py|ui_catalog.py" --exec echo "Found: {}"
echo -e "\n=== Checking for referenced symbols ==="
rg -n "def _maybe_inject_ui_instructions" --type py
rg -n "class UiStreamExtractor" --type py
rg -n "def render_primitives_section" --type py
rg -n "def heal_op_line" --type pyRepository: juspay/clairvoyance
Length of output: 1013
🏁 Script executed:
#!/bin/bash
set -euo pipefail
cd "$(git rev-parse --show-toplevel)"
echo "=== Confirming exact line numbers mentioned in docs ==="
# Show definition line numbers and surrounding context
python3 - <<'PY'
import pathlib, re
paths = [
"app/ai/voice/agents/breeze_buddy/template/builder.py",
"app/ai/voice/agents/breeze_buddy/mcp/__init__.py",
"app/ai/voice/agents/breeze_buddy/chat/ui_stream.py",
"app/ai/voice/agents/breeze_buddy/template/ui_prompt.py",
"app/ai/voice/agents/breeze_buddy/chat/ui_healer.py",
"app/ai/voice/agents/breeze_buddy/chat/agent.py",
]
for p in paths:
path = pathlib.Path(p)
if not path.exists():
print("MISSING", p)
continue
lines = path.read_text().splitlines()
def find(patterns):
for i,l in enumerate(lines, start=1):
for pat in patterns:
if re.search(pat, l):
return i,l
return None,None
if p.endswith("builder.py"):
i,l = find([r"render_primitives_section", r"Available primitives", r"system prompt", r"splic"])
if i:
print(p, "first_match_line", i, l.strip()[:160])
# also print around line 351 specifically
target=351
print("\n--", p, "around line", target, "--")
lo=max(1,target-20); hi=min(len(lines),target+20)
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
elif p.endswith("mcp/__init__.py"):
# find definition
for i,l in enumerate(lines, start=1):
if re.search(r"def _maybe_inject_ui_instructions", l):
print(p, "def_line", i)
print("\n-- around", i, "--")
lo=max(1,i-10); hi=min(len(lines),i+10)
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
break
elif p.endswith("ui_stream.py"):
for i,l in enumerate(lines, start=1):
if re.search(r"class UiStreamExtractor", l):
print(p, "class_line", i)
print("\n-- around", i, "--")
lo=max(1,i-10); hi=min(len(lines),i+10)
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
break
elif p.endswith("ui_prompt.py"):
for i,l in enumerate(lines, start=1):
if re.search(r"def render_primitives_section", l):
print(p, "def_line", i)
break
elif p.endswith("ui_healer.py"):
for i,l in enumerate(lines, start=1):
if re.search(r"def heal_op_line", l):
print(p, "def_line", i)
break
PY
echo
echo "=== Checking docs table line 21-27 content ==="
nl -ba docs/widget/UI_PERF_OPTIMIZATIONS.md | sed -n '1,120p'Repository: juspay/clairvoyance
Length of output: 4513
🏁 Script executed:
#!/bin/bash
set -euo pipefail
cd "$(git rev-parse --show-toplevel)"
DOC="docs/widget/UI_PERF_OPTIMIZATIONS.md"
echo "=== Docs excerpt (lines 1-120) ==="
python3 - <<PY
from pathlib import Path
p=Path("$DOC")
lines=p.read_text().splitlines()
for i in range(1, min(120, len(lines))+1):
print(f"{i:4d}: {lines[i-1]}")
PY
echo
echo "=== Find _splice_ui_primitives usage + implementation ==="
rg -n "_splice_ui_primitives" app/ai/voice/agents/breeze_buddy/template/builder.py
python3 - <<'PY'
import pathlib, re
path="app/ai/voice/agents/breeze_buddy/template/builder.py"
lines=pathlib.Path(path).read_text().splitlines()
for i,l in enumerate(lines, start=1):
if re.search(r"def _splice_ui_primitives", l):
lo=max(1,i-25); hi=min(len(lines), i+80)
for k in range(lo, hi+1):
print(f"{k:5d}: {lines[k-1]}")
break
else:
print("def _splice_ui_primitives not found in builder.py")
PY
echo
echo "=== Verify render_primitives_section linkage ==="
rg -n "render_primitives_section" app/ai/voice/agents/breeze_buddy/template/builder.py
rg -n "render_primitives_section" app/ai/voice/agents/breeze_buddy/template/ui_prompt.py | head -n 20
echo
echo "=== UiStreamExtractor: carry buffer + JSONL per-line handling ==="
python3 - <<'PY'
import pathlib, re
path="app/ai/voice/agents/breeze_buddy/chat/ui_stream.py"
lines=pathlib.Path(path).read_text().splitlines()
# print around carry/buffer mentions and parse/validate pipeline
need_patterns=[
r"carry",
r"16",
r"JsonlOpLine",
r"heal_op_line",
r"parse_op_line",
r"validate_props",
r"Pydantic",
r"SSE",
r"yield",
]
hits=[]
for i,l in enumerate(lines, start=1):
for pat in need_patterns:
if re.search(pat, l):
hits.append((i,l.strip()))
break
for i,l in hits[:40]:
print(f"{i:5d}: {l}")
PY
echo
echo "=== Print UiStreamExtractor feed/flush and pipeline region ==="
python3 - <<'PY'
import pathlib, re
path="app/ai/voice/agents/breeze_buddy/chat/ui_stream.py"
lines=pathlib.Path(path).read_text().splitlines()
# locate feed() and flush()
def print_around(pattern, context=60):
for i,l in enumerate(lines, start=1):
if re.search(pattern, l):
lo=max(1,i-context); hi=min(len(lines), i+context)
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
return True
return False
ok = print_around(r"def feed\(") or print_around(r"def flush\(")
if not ok:
print("Could not locate feed()/flush() via regex")
PY
echo
echo "=== ui_healer: heal_op_line / validate_props (Pydantic) ==="
python3 - <<'PY'
import pathlib, re
path="app/ai/voice/agents/breeze_buddy/chat/ui_healer.py"
lines=pathlib.Path(path).read_text().splitlines()
# locate heal_op_line and parse_op_line/validate_props if present
for target in [r"def heal_op_line", r"def parse_op_line", r"validate_props", r"Pydantic", r"BaseModel", r"SSE", r"jsonschema"]:
for i,l in enumerate(lines, start=1):
if re.search(target, l):
print("\n--- match:", target, "at", i, "---")
lo=max(1,i-40); hi=min(len(lines), i+120)
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
break
PYRepository: juspay/clairvoyance
Length of output: 25253
🏁 Script executed:
#!/bin/bash
set -euo pipefail
cd "$(git rev-parse --show-toplevel)"
echo "=== Inspect UiStreamExtractor carry logic (_CARRY_MAX, _tail_marker_prefix) ==="
python3 - <<'PY'
import pathlib, re
path="app/ai/voice/agents/breeze_buddy/chat/ui_stream.py"
lines=pathlib.Path(path).read_text().splitlines()
# print region where _CARRY_MAX/_tail_marker_prefix/_process is defined
for i,l in enumerate(lines, start=1):
if re.search(r"_CARRY_MAX", l):
lo=max(1,i-20); hi=min(len(lines), i+80)
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
break
else:
print("No _CARRY_MAX found")
PY
echo
echo "=== Inspect parse_op_line + process_op_line pipeline (does it call healer then validate) ==="
python3 - <<'PY'
import pathlib, re
path="app/ai/voice/agents/breeze_buddy/chat/ui_stream.py"
lines=pathlib.Path(path).read_text().splitlines()
targets=["def parse_op_line","def process_op_line","def validate_props","ui_op_event","yield ui_op"]
for t in targets:
for i,l in enumerate(lines, start=1):
if re.search(t.replace("(","\\("), l):
print("\n--- around", t, "at", i, "---")
lo=max(1,i-40); hi=min(len(lines), i+120)
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
break
PY
echo
echo "=== Find validate_props definition across repo ==="
rg -n "def validate_props|validate_props\s*=" app/ai/voice/agents/breeze_buddy --type py
echo
echo "=== Inspect mcp/_maybe_inject_ui_instructions keys ==="
python3 - <<'PY'
import pathlib, re
path="app/ai/voice/agents/breeze_buddy/mcp/__init__.py"
lines=pathlib.Path(path).read_text().splitlines()
# find function block and print relevant key names
start=None
for i,l in enumerate(lines, start=1):
if re.search(r"def _maybe_inject_ui_instructions", l):
start=i; break
if not start:
print("function not found")
raise SystemExit
lo=max(1,start); hi=min(len(lines), start+160)
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
PY
echo
echo "=== Inspect render_primitives_section for caching ==="
python3 - <<'PY'
import pathlib, re
path="app/ai/voice/agents/breeze_buddy/template/ui_prompt.py"
lines=pathlib.Path(path).read_text().splitlines()
# print around render_primitives_section
for i,l in enumerate(lines, start=1):
if re.search(r"def render_primitives_section", l):
lo=max(1,i-40); hi=min(len(lines), i+200)
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
break
PY
echo
echo "=== Search for prompt caching / lru_cache around render_primitives_section ==="
rg -n "lru_cache|cache|memoiz|functools" app/ai/voice/agents/breeze_buddy/template/ui_prompt.py app/ai/voice/agents/breeze_buddy/template/builder.py --type pyRepository: juspay/clairvoyance
Length of output: 37181
🏁 Script executed:
#!/bin/bash
set -euo pipefail
cd "$(git rev-parse --show-toplevel)"
echo "=== ui_catalog.validate_props (confirm Pydantic / validation scope) ==="
python3 - <<'PY'
import pathlib
path="app/ai/voice/agents/breeze_buddy/template/ui_catalog.py"
lines=pathlib.Path(path).read_text().splitlines()
# find validate_props definition line
for i,l in enumerate(lines, start=1):
if "def validate_props" in l:
start=max(1,i-40); end=min(len(lines), i+180)
for k in range(start, end+1):
print(f"{k:5d}: {lines[k-1]}")
break
else:
print("validate_props not found")
PY
echo
echo "=== builder.py around system_prompt splice (caching context) ==="
python3 - <<'PY'
import pathlib
path="app/ai/voice/agents/breeze_buddy/template/builder.py"
lines=pathlib.Path(path).read_text().splitlines()
# print around line ~351 and around where templates L2-cached noted
for target in [241, 330, 351, 500, 520, 530, 540]:
if target<1 or target>len(lines):
continue
lo=max(1,target-20); hi=min(len(lines), target+60)
print(f"\n--- around {path}:{target} ---")
for k in range(lo,hi+1):
print(f"{k:5d}: {lines[k-1]}")
PY
echo
echo "=== Template cache module references ==="
rg -n "template/cache|L2-cache|l2_cached|cache" app/ai/voice/agents/breeze_buddy/template --type pyRepository: juspay/clairvoyance
Length of output: 34127
Fix UI_PERF_OPTIMIZATIONS recap inaccuracies and clarify perf numbers
- The referenced files/symbols currently exist (
template/builder.py:351calls_splice_ui_primitives(...), andmcp/__init__.py:_maybe_inject_ui_instructionsinjects_ui_instructions/_ui_examplesinto JSON dict results). chat/ui_stream.py:UiStreamExtractorcarry isn’t a fixed “16-char” buffer; it’s derived as_CARRY_MAX = max(len("<ui_stream>"), len("</ui_stream>")) - 1.- The pipeline description overstates validation:
validate_props(Pydantic) is applied foraddops, whilereplaceonly has weak checks inparse_op_line(strong type validation is handled on the widget side). - The cost breakdown table claims concrete timings (e.g., “~2-4s for … 8 tiles”) without citations/measurement methodology; label as estimates or add profiling details/conditions. Consider adding a “last verified” timestamp for these claims.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 13 - 27, Update the
UI_PERF_OPTIMIZATIONS doc to correct the inaccuracies: mention that
template/builder.py calls _splice_ui_primitives(...) at the noted location and
mcp/__init__.py:_maybe_inject_ui_instructions injects
_ui_instructions/_ui_examples (already present), change the UiStreamExtractor
carry buffer description to reference _CARRY_MAX = max(len("<ui_stream>"),
len("</ui_stream>")) - 1 instead of a fixed "16-char" value, clarify that
validate_props (Pydantic) is applied to "add" ops while "replace" only undergoes
weaker checks in parse_op_line with stronger validation deferred to the widget,
and make the cost table estimates explicit by labeling timings as approximate
(or add profiling methodology/conditions and a "last verified" timestamp) rather
than asserting exact seconds.
|
|
||
| After a tool returns, immediately emit a default UI from `ToolUiHint.examples[0]` (or a deterministic mapping) **while the LLM is still generating**. Treat the LLM's eventual `<ui_stream>` as a refinement — same `id`s → `replace` ops, new ids → `add` ops, missing ids → `remove`. | ||
|
|
||
| - Effort: 2-3 days | ||
| - Latency: TTFR drops from ~1-2s to ~100ms on tool-result-driven turns | ||
| - Risk: low if op-merge semantics are correctly defined on the widget; needs explicit "speculative" tagging so analytics can measure refinement deltas |
There was a problem hiding this comment.
Consider UX implications of speculative emission.
A2 proposes emitting a default UI immediately while the LLM generates a refinement. This optimization could create a jarring user experience if the speculative UI differs significantly from the LLM's final output (flickering, layout shifts, confusing interactions if the user clicks on a speculative element that then changes).
The "speculative tagging" for analytics is mentioned, but consider also:
- Rate-limiting or suppressing speculative emission if the refinement typically arrives within ~200ms
- Visual indicators to the user that content is being refined
- Handling user interactions with speculative elements that are then removed/replaced
- Measuring the "refinement delta" (how often does the LLM override the default?) to assess if this optimization is worth the complexity
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 143 - 148, Update the
speculative UI emission proposal to mitigate jarring UX by: add a
rate-limit/suppression threshold (e.g. skip speculative emit if refinement <
~200ms) around the emission logic that currently uses ToolUiHint.examples[0];
surface a clear visual indicator/placeholder state for speculative content so
users know it's a refinement in progress; implement interaction guards that
debounce or disable actions on speculative elements and queue user actions to be
re-applied after the LLM <ui_stream> arrives; ensure the widget op-merge
semantics for ids follow the described replace/add/remove behavior and tag
emitted ops with a "speculative" flag so analytics can compute a refinement
delta metric (how often final <ui_stream> changes ids or values).
| #### C1. Tool-call escape hatch for SpecStream | ||
|
|
||
| Define a tool called `render_ui` whose JSON-Schema-constrained input *is* the ops list. Anthropic guarantees the tool input validates against the schema. Replaces `<ui_stream>` markers entirely — the streaming SDK emits `content_block_delta(partial_json=…)` events that we accumulate into a tool call. Healer's malformed-JSON and unknown-type drops disappear because the schema enforces them. | ||
|
|
||
| Trade-off: tool-call streaming is per-arg, not per-op-line. We'd accumulate the full ops list before processing. Compatible with A1's data-binding pattern; conflicts with B2's partial-JSON streaming unless we use the SDK's `partial_json` events directly. | ||
|
|
||
| - Effort: 1 week | ||
| - Latency: small TTFT gain (no marker parsing), large healer simplification | ||
| - Risk: medium — locks us into Anthropic-shaped tool-call semantics; Gemini / OpenAI parity needs verification |
There was a problem hiding this comment.
Assess vendor lock-in risk for C1 tool-call approach.
C1 proposes using Anthropic's tool-call mechanism for SpecStream rendering. While the risk section notes this "locks us into Anthropic-shaped tool-call semantics," this deserves deeper consideration:
- Gemini/OpenAI parity: Tool-call streaming behavior differs across providers. Anthropic streams
partial_jsonwithin tool calls; OpenAI and Gemini may have different semantics. - Fallback complexity: If C1 is implemented, maintaining both the tool-call path (Anthropic) and the
<ui_stream>marker path (other providers) adds significant maintenance burden. - Migration cost: If Anthropic's tool-call semantics change in future API versions, this could require significant rework.
Consider explicitly documenting a provider abstraction strategy or accepting that this optimization may be Anthropic-specific only. The "1 week" effort estimate may not account for maintaining dual code paths.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 194 - 202, Summarize and
mitigate vendor lock-in for the C1 "render_ui" tool-call approach: assess parity
of tool-call streaming semantics across providers (Anthropic partial_json vs
OpenAI/Gemini), list required fallback behavior for the existing <ui_stream>
marker path, and estimate added maintenance/migration cost if Anthropic changes;
update docs in UI_PERF_OPTIMIZATIONS.md (section C1) to either (a) define a
clear provider abstraction strategy with interfaces/feature-detection to switch
between render_ui tool-call and <ui_stream> marker handling, or (b) explicitly
mark C1 as Anthropic-only with rationale and ongoing maintenance estimate, and
adjust the "Effort: 1 week" estimate accordingly.
| ## 5. Measurement plan | ||
|
|
||
| Before phase 0 ships, instrument: | ||
|
|
||
| - `ttft_ms` — from `/message` POST to first `assistant_token` / first `ui_op` SSE event (split metric) | ||
| - `ttfui_ms` — from `/message` POST to first `ui_op` SSE event | ||
| - `ttlui_ms` — from `/message` POST to last `ui_op` SSE event in the turn | ||
| - `llm_output_tokens` — break down by `<ui_stream>` vs prose | ||
| - `cache_read_input_tokens` — Anthropic cache hit rate after D2 lands | ||
| - `ui_op_dropped` count + reason — healer success rate; should drop after C1/C2 | ||
|
|
||
| Compare p50/p95/p99 across phases. Tag every SSE stream with the active optimisation phase so we can A/B in production. |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add baseline measurements and user-perceived metrics.
The measurement plan is comprehensive for backend metrics, but consider adding:
-
Baseline measurements: Before Phase 0 ships, capture current p50/p95/p99 for all six metrics to establish a quantitative baseline. The document references "~2-4s" and "~1-2s" but these should be backed by actual measurements.
-
User-perceived quality metrics (especially for Phase 5's speculative emission):
speculative_refinement_delta_ms: Time between speculative UI render and LLM refinement arrivalspeculative_change_rate: Percentage of speculative UIs that are modified by LLMlayout_shift_score: CLS-like metric for UI jank during refinement
-
Cache effectiveness (for Phases 0-1):
cache_hit_rate: Percentage of requests hitting prompt cachecache_miss_rate: Percentage missing cache
(Not justcache_read_input_tokens, which is an absolute count)
-
Per-optimization attribution: The plan mentions "tag every SSE stream with the active optimisation phase," but consider also tagging which specific optimizations are active (e.g., a request might have D1+D2+D3 enabled but not D4).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 293 - 304, Add explicit
baseline measurements and new user-perceived and cache-effectiveness metrics to
the Measurement plan: record p50/p95/p99 baselines for each existing metric
(ttft_ms, ttfui_ms, ttlui_ms, llm_output_tokens, cache_read_input_tokens,
ui_op_dropped) before Phase 0, and add speculative_refinement_delta_ms,
speculative_change_rate, and layout_shift_score to capture user-perceived impact
of speculative emits; also add cache_hit_rate and cache_miss_rate alongside
cache_read_input_tokens for relative cache effectiveness, and expand the SSE
tagging proposal to include per-request flags for which specific optimizations
(e.g., D1, D2, D3, D4) are active in addition to the overall optimisation phase
tag.
| ## 6. Generic functionality preserved | ||
|
|
||
| Every recommendation is additive: | ||
|
|
||
| - **A1's data-binding split** keeps the flat-op form as a fallback. LLM can still emit a hand-crafted tree when needed. | ||
| - **A2's speculative emission** is fully overridable — the LLM's final ops win. | ||
| - **A3's compact wire form** is a pure encoding alias; canonical ops downstream are unchanged. | ||
| - **B1/B2** are skeleton/progressive enhancement; final state matches today's. | ||
| - **C1/C2** narrow what's *allowed* but the existing healer rules already enforce this — we're just moving enforcement to the right layer. | ||
| - **D1-D5** are pure caching/short-circuiting; no contract change. | ||
|
|
||
| The catalog remains the single source of truth. Templates still declare allowlists and JIT instructions. Merchants still get per-template UI customisation. | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Ensure fallback behaviors are tested.
This section makes important claims about preserved contracts and fallback mechanisms. These should be explicitly tested as part of each phase rollout:
- A1: Verify that LLM can still emit flat-op form when data-binding is inappropriate (e.g., heterogeneous lists, one-off UI elements)
- A2: Test that LLM's ops correctly override speculative emission (especially edge cases like same
idbut differenttype, or partial property override) - C1/C2: Validate that constrained generation schemas permit everything the existing healer allows (no false negatives)
- D4: Confirm that short-circuit validation doesn't introduce correctness regressions (background Pydantic still catches all issues)
Consider adding a "Contract Tests" subsection to the measurement plan (Section 5) that explicitly validates these guarantees don't regress across phases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/widget/UI_PERF_OPTIMIZATIONS.md` around lines 308 - 320, Add explicit
contract tests to the rollout measurement plan (Section 5) that validate the
fallback behaviors described in Section 6: create tests for A1 ensuring the LLM
can still emit flat-op form for heterogeneous lists and one-off UI elements; for
A2 ensure speculative emission is overridden by LLM final ops including edge
cases like identical id/different type and partial property overrides; for C1/C2
validate constrained generation schemas permit every case allowed by the
existing healer (no false negatives); and for D4 add a short-circuit validation
test that confirms background Pydantic validation still catches all issues.
Include these as a "Contract Tests" subsection and reference the specific
guarantees (A1, A2, C1, C2, D4) so they run in CI during each phase rollout.
Summary
Add a comprehensive design document outlining performance optimization strategies for the Breeze Buddy widget's UI generation pipeline. This document surveys current architecture, benchmarks against industry standards (Vercel json-render, Google A2UI, Tambo), and proposes a phased rollout of optimizations targeting token economy, perceived TTFR, and healer simplification.
Key Changes
docs/widget/UI_PERF_OPTIMIZATIONS.md(343 lines)Notable Implementation Details
Tier A (Token Economy): Proposes adopting json-render's
state+elementssplit with$itembinding andrepeatdirectives to achieve 5–10× LLM output reduction on list-rendering turns; server-side speculative emission to drop TTFR to ~100ms on tool-result-driven turns; compact wire form shorthand to save ~60% tokens per op.Tier B (Per-Prop Streaming): Tambo-style per-prop status and partial-JSON parsing to drop perceived TTFR to ~200–400ms by rendering skeletons before full props arrive.
Tier C (Constrained Decoding): Tool-call escape hatch and JSON Schema mode to eliminate malformed-JSON and unknown-op healer drops entirely.
Tier D (Free Wins): LRU caching of primitives section, Anthropic prompt cache markers, moving
_ui_examplesto cached system prompt, short-circuit validation, andemits_uiflag to save 50–150ms TTFT on prose-only turns.Rollout sequence: 6 phases from 2-day "free wins" baseline through 2-week data/structure split, with per-phase effort, latency win, and goal clearly stated.
Measurement plan: Defines 6 key metrics (ttft_ms, ttfui_ms, ttlui_ms, llm_output_tokens, cache_read_input_tokens, ui_op_dropped) with p50/p95/p99 tracking and per-phase tagging for A/B analysis.
Preservation of contract: Explicitly verifies that every recommendation is additive and maintains the catalog as single source of truth, template allowlists, and per-template UI customization.
This document serves as a reference for prioritizing UI performance work and aligns the team on industry best practices before implementation begins.
https://claude.ai/code/session_01PXYLybfLwxL3jrR3mPvoY9
Summary by CodeRabbit