Simulation backend implementation by letsgogeeky · Pull Request #73 · letsgogeeky/aichitect

letsgogeeky · 2026-04-11T18:33:07Z

What does this PR do?

Type of change

Data (add/update tool, stack, or relationship)
Bug fix
New feature
Docs / chore

Checklist

make check passes (lint + typecheck)
No unrelated files changed
PR title follows Conventional Commits (feat:, fix:, chore:, docs:)

Notes for reviewers

- Engine now takes tools[] as an argument; route loads via Supabase so AA-cron writes to latency_p50_ms and cost_model are picked up. - avgTokensPerRequest replaced with avgInputTokens + avgOutputTokens; TOKEN_DEFAULTS provides per-use-case defaults (RAG-heavy input, chatbot-heavy output). - CostModel.cost_per_call lets per_call tools project a real rate; unparameterised per_call falls under unprojected_cost.

- /simulate/results renders 5 panels: cost-over-time SVG line chart with first-breaking-point marker, latency breakdown bars, breaking points cards with recommendations, kill conditions + switch-away triggers, and a share-URL button. - lib/simulateUrl.ts encodes/decodes SimulationInput <-> query params (uc/u/r/in/out/llm/vec/fw) for shareable links; 7 tests cover round trips and invalid input. - BreakingPoint gains an optional recommendation field, populated for latency / cost / architecture rules.

- New /simulate page: use-case chip → scale sliders → stack picker, with smart defaults per use case and "Import from Genome" pre-fill from ?s= (slot-based mapping to llm/vectorDb/framework). - LogSlider primitive for the log-scale sliders (users 1k-10M, tokens 500-50k) and a linear one for requests/day. - ToolPicker dropdown filters LLMs to per_token cost_model so only modelable providers appear. - Submits to /simulate/results with avgTokens split into in/out via the per-use-case TOKEN_DEFAULTS ratio. 6 tests cover the split and default-bound invariants.

- lib/simulateDelta.ts: pure computeDelta(current, shadow) returns per-scale cost delta, per-layer latency table, crossover users, annualised savings, and a 4-state verdict (switch_now / switch_above_X / latency_only / stick) with a human-readable line. - Results page parses llm2/vec2/fw2 from URL, runs a second simulate with the shadow stack, computes delta, passes both to the client. - 5 new components: CostDeltaChart (two-line overlay with shaded delta region), LatencyDeltaTable, BreakingPointDelta, SwitchVerdict card, ShadowStackForm (inline 3-picker that updates the URL via router.push). - lib/simulateUrl.ts gains appendShadowStack / dropShadowStack / parseShadowStack; 5 new URL-helper tests + 6 delta engine tests cover all four verdict states.

Replaces the 3-step gated flow + separate /simulate/results page with one /simulate route. Inputs sit in a sticky sidebar; results redraw instantly via the pure simulate() engine as sliders move (no API round-trip). URL stays in sync (debounced 300ms) so simulations remain shareable. Token UX: - "Avg tokens" single-slider replaced with explicit input + output sliders. - Row of named presets — Quick chat, Long Q&A, RAG retrieval, Agent loop, Code generation — so users pick something concrete instead of guessing a number. Chart hover (CostChart + CostDeltaChart): - Hover bands per scale step snap a vertical guide line to the nearest data point. - SVG-native tooltip shows user count, total cost, per-tool breakdown, latency; flips left near the right edge. Wiring: - /simulate/results route deleted; ShadowStackForm now takes callbacks instead of pushing to URL (parent owns the URL). - ScaleStep/UseCaseStep/StackStep stripped of headings — the parent panel renders section labels. - splitTokens helper removed (no longer needed); SCALE_BOUNDS gains per-direction tokens; SCALE_DEFAULTS now references TOKEN_DEFAULTS.

… panels Engine - LLM latency = TTFT + (output_tokens / tokens_per_second). TTFT and throughput are separate fields; the AA cron now syncs both. - Prompt caching: per-token cost blends cached vs uncached input by cacheHitRate (90% off cached); batch pricing blends real-time and batch endpoints by batchPct (50% off). - Vector DB cost: new per_vector_query cost model with storage_cost_per_gb_month + query_cost_per_million + min_monthly_cost. Engine projects storage from vectorCount × bytes_per_vector and queries from monthlyRequests. - Embedding cost: RAG paths add per-query embedding cost (default text-embedding-3-small rate when no embedding tool selected). - Eval cost: per_event cost model for Langfuse/Helicone/Braintrust. - Model routing: optional routerCheapLlm + routerCheapPct splits LLM cost across two models. - Rate-limit modeling: max_tpm + max_rpm + peakToAverageRatio surface a rate-limit breaking point with a tier-upgrade recommendation. - Per-snapshot output adds costPerRequest, costPerUser, costByLayer, and per-stage latencyByStage (guardrails / embedding / vector / ttft / generation / framework). - Result gains a bottleneck verdict (cost / latency / rate_limit / balanced / none) with a one-line diagnosis. UI - /simulate gains new sliders for cache-hit %, batch %, stored vectors (RAG only) and new pickers for eval and guardrails. - New panels: Bottleneck verdict, Unit economics (cost/mo, /req, /user, /year), Cost composition stacked bar, Provider comparison table (re-runs simulate per per-token LLM, click to switch primary). - LatencyBreakdown renders the six-stage split with share-of-total. - CostChart hover tooltip now shows latency in addition to per-tool cost breakdown. Data - New columns: ttft_p50_ms, output_tokens_per_second, max_tpm, max_rpm, bytes_per_vector. AA cron writes ttft + throughput alongside cost_model. - tools.json: 7 AA LLMs backfilled with realistic 2026 TTFT, throughput, cached/batch pricing, and rate limits; Pinecone/Weaviate/Turbopuffer switched from usage_based to per_vector_query; Langfuse/Helicone/Braintrust switched to per_event with published per-observation rates. URL - New compact params: cr (cacheHitRate), bp (batchPct), vc (vectorCount), et (embeddingTokensPerQuery), pk (peakRatio), em (embedding), ev (eval), gd (guardrails), llmC + rcp (router). Heads-up: requires `make db-push && make seed` to populate the remote Supabase with the new columns and values; without that the /simulate page reads NULLs and falls back to default throughput (50 tok/s) for every LLM — the symptom is uniform ~10s latency across providers.

…ggers - LATENCY_CEILING_MS is now use-case-aware: chatbot 8s, RAG 12s, agent 25s, custom 10s. The old 2s ceiling fired for almost every realistic 2026 stack (Sonnet at 46 tok/s producing 600 output is 13s before anything else). - New TTFT-slow breaking point fires when ttft_p50_ms > 1500ms — decoupled from total length since streaming hides generation time. - Bottleneck verdict text drops the hardcoded "2s" and references the use-case comfort ceiling instead. - KillConditionsPanel's "Switch away when..." list now only includes rate_limit and latency triggers; cost milestones and LLM-dominance signals are tuning advice (caching, routing, batch) and stay in the Breaking points panel instead.

Clears all high-severity npm audit findings the CI gate was failing on (npm audit --audit-level=high now exits 0): - next 16.2.4 → 16.2.6 — fixes the high-severity SSRF in WebSocket upgrades, middleware/proxy bypasses, cache poisoning, and CSP-nonce XSS issues (GHSA-c4j6-fc7j-m34r and friends). - @hono/node-server, hono, fast-uri, ip-address, express-rate-limit — picked up via the dependency tree refresh; resolves the high-severity fast-uri path-traversal/host-confusion CVE (GHSA-q3j6-qgpj-74h6, GHSA-v39h-62p7-jpjc). Five moderate findings remain, all tracing to the postcss copy bundled inside next 16.2.x. The only fixes available today are either next 16.3.x (not yet released as stable) or downgrading @vercel/speed-insights to 1.0.4 (breaking change) — leaving for the next routine bump.

letsgogeeky added 2 commits May 14, 2026 20:30

Simulation backend implementation

4db0dba

letsgogeeky force-pushed the feat/AIC-126 branch from f1ea10d to 34b25bb Compare May 14, 2026 19:21

letsgogeeky added 8 commits May 14, 2026 21:36

feat(simulate): surface in navbar, landing hero, footer, and sitemap

241da62

letsgogeeky merged commit 0e64157 into main May 14, 2026
1 check passed

github-actions Bot deleted the feat/AIC-126 branch May 14, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulation backend implementation#73

Simulation backend implementation#73
letsgogeeky merged 10 commits into
mainfrom
feat/AIC-126

letsgogeeky commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

letsgogeeky commented Apr 11, 2026

What does this PR do?

Type of change

Checklist

Notes for reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant