Skip to content

Simulation backend implementation#73

Merged
letsgogeeky merged 10 commits into
mainfrom
feat/AIC-126
May 14, 2026
Merged

Simulation backend implementation#73
letsgogeeky merged 10 commits into
mainfrom
feat/AIC-126

Conversation

@letsgogeeky

Copy link
Copy Markdown
Owner

What does this PR do?

Type of change

  • Data (add/update tool, stack, or relationship)
  • Bug fix
  • New feature
  • Docs / chore

Checklist

  • make check passes (lint + typecheck)
  • No unrelated files changed
  • PR title follows Conventional Commits (feat:, fix:, chore:, docs:)

Notes for reviewers

- Engine now takes tools[] as an argument; route loads via Supabase
  so AA-cron writes to latency_p50_ms and cost_model are picked up.
- avgTokensPerRequest replaced with avgInputTokens + avgOutputTokens;
  TOKEN_DEFAULTS provides per-use-case defaults (RAG-heavy input,
  chatbot-heavy output).
- CostModel.cost_per_call lets per_call tools project a real rate;
  unparameterised per_call falls under unprojected_cost.
- /simulate/results renders 5 panels: cost-over-time SVG line chart
  with first-breaking-point marker, latency breakdown bars, breaking
  points cards with recommendations, kill conditions + switch-away
  triggers, and a share-URL button.
- lib/simulateUrl.ts encodes/decodes SimulationInput <-> query params
  (uc/u/r/in/out/llm/vec/fw) for shareable links; 7 tests cover round
  trips and invalid input.
- BreakingPoint gains an optional recommendation field, populated for
  latency / cost / architecture rules.
- New /simulate page: use-case chip → scale sliders → stack picker,
  with smart defaults per use case and "Import from Genome" pre-fill
  from ?s= (slot-based mapping to llm/vectorDb/framework).
- LogSlider primitive for the log-scale sliders (users 1k-10M,
  tokens 500-50k) and a linear one for requests/day.
- ToolPicker dropdown filters LLMs to per_token cost_model so only
  modelable providers appear.
- Submits to /simulate/results with avgTokens split into in/out via
  the per-use-case TOKEN_DEFAULTS ratio. 6 tests cover the split and
  default-bound invariants.
- lib/simulateDelta.ts: pure computeDelta(current, shadow) returns
  per-scale cost delta, per-layer latency table, crossover users,
  annualised savings, and a 4-state verdict (switch_now /
  switch_above_X / latency_only / stick) with a human-readable line.
- Results page parses llm2/vec2/fw2 from URL, runs a second simulate
  with the shadow stack, computes delta, passes both to the client.
- 5 new components: CostDeltaChart (two-line overlay with shaded
  delta region), LatencyDeltaTable, BreakingPointDelta,
  SwitchVerdict card, ShadowStackForm (inline 3-picker that updates
  the URL via router.push).
- lib/simulateUrl.ts gains appendShadowStack / dropShadowStack /
  parseShadowStack; 5 new URL-helper tests + 6 delta engine tests
  cover all four verdict states.
Replaces the 3-step gated flow + separate /simulate/results page with
one /simulate route. Inputs sit in a sticky sidebar; results redraw
instantly via the pure simulate() engine as sliders move (no API
round-trip). URL stays in sync (debounced 300ms) so simulations remain
shareable.

Token UX:
- "Avg tokens" single-slider replaced with explicit input + output
  sliders.
- Row of named presets — Quick chat, Long Q&A, RAG retrieval, Agent
  loop, Code generation — so users pick something concrete instead of
  guessing a number.

Chart hover (CostChart + CostDeltaChart):
- Hover bands per scale step snap a vertical guide line to the
  nearest data point.
- SVG-native tooltip shows user count, total cost, per-tool
  breakdown, latency; flips left near the right edge.

Wiring:
- /simulate/results route deleted; ShadowStackForm now takes callbacks
  instead of pushing to URL (parent owns the URL).
- ScaleStep/UseCaseStep/StackStep stripped of headings — the parent
  panel renders section labels.
- splitTokens helper removed (no longer needed); SCALE_BOUNDS gains
  per-direction tokens; SCALE_DEFAULTS now references TOKEN_DEFAULTS.
… panels

Engine
- LLM latency = TTFT + (output_tokens / tokens_per_second). TTFT and
  throughput are separate fields; the AA cron now syncs both.
- Prompt caching: per-token cost blends cached vs uncached input by
  cacheHitRate (90% off cached); batch pricing blends real-time and
  batch endpoints by batchPct (50% off).
- Vector DB cost: new per_vector_query cost model with
  storage_cost_per_gb_month + query_cost_per_million + min_monthly_cost.
  Engine projects storage from vectorCount × bytes_per_vector and
  queries from monthlyRequests.
- Embedding cost: RAG paths add per-query embedding cost (default
  text-embedding-3-small rate when no embedding tool selected).
- Eval cost: per_event cost model for Langfuse/Helicone/Braintrust.
- Model routing: optional routerCheapLlm + routerCheapPct splits LLM
  cost across two models.
- Rate-limit modeling: max_tpm + max_rpm + peakToAverageRatio surface
  a rate-limit breaking point with a tier-upgrade recommendation.
- Per-snapshot output adds costPerRequest, costPerUser, costByLayer,
  and per-stage latencyByStage (guardrails / embedding / vector / ttft
  / generation / framework).
- Result gains a bottleneck verdict (cost / latency / rate_limit /
  balanced / none) with a one-line diagnosis.

UI
- /simulate gains new sliders for cache-hit %, batch %, stored vectors
  (RAG only) and new pickers for eval and guardrails.
- New panels: Bottleneck verdict, Unit economics (cost/mo, /req, /user,
  /year), Cost composition stacked bar, Provider comparison table
  (re-runs simulate per per-token LLM, click to switch primary).
- LatencyBreakdown renders the six-stage split with share-of-total.
- CostChart hover tooltip now shows latency in addition to per-tool
  cost breakdown.

Data
- New columns: ttft_p50_ms, output_tokens_per_second, max_tpm,
  max_rpm, bytes_per_vector. AA cron writes ttft + throughput
  alongside cost_model.
- tools.json: 7 AA LLMs backfilled with realistic 2026 TTFT, throughput,
  cached/batch pricing, and rate limits; Pinecone/Weaviate/Turbopuffer
  switched from usage_based to per_vector_query;
  Langfuse/Helicone/Braintrust switched to per_event with published
  per-observation rates.

URL
- New compact params: cr (cacheHitRate), bp (batchPct), vc (vectorCount),
  et (embeddingTokensPerQuery), pk (peakRatio), em (embedding),
  ev (eval), gd (guardrails), llmC + rcp (router).

Heads-up: requires `make db-push && make seed` to populate the remote
Supabase with the new columns and values; without that the /simulate
page reads NULLs and falls back to default throughput (50 tok/s) for
every LLM — the symptom is uniform ~10s latency across providers.
…ggers

- LATENCY_CEILING_MS is now use-case-aware: chatbot 8s, RAG 12s,
  agent 25s, custom 10s. The old 2s ceiling fired for almost every
  realistic 2026 stack (Sonnet at 46 tok/s producing 600 output is
  13s before anything else).
- New TTFT-slow breaking point fires when ttft_p50_ms > 1500ms —
  decoupled from total length since streaming hides generation time.
- Bottleneck verdict text drops the hardcoded "2s" and references
  the use-case comfort ceiling instead.
- KillConditionsPanel's "Switch away when..." list now only includes
  rate_limit and latency triggers; cost milestones and LLM-dominance
  signals are tuning advice (caching, routing, batch) and stay in
  the Breaking points panel instead.
Clears all high-severity npm audit findings the CI gate was failing
on (npm audit --audit-level=high now exits 0):
- next 16.2.4 → 16.2.6 — fixes the high-severity SSRF in WebSocket
  upgrades, middleware/proxy bypasses, cache poisoning, and CSP-nonce
  XSS issues (GHSA-c4j6-fc7j-m34r and friends).
- @hono/node-server, hono, fast-uri, ip-address, express-rate-limit —
  picked up via the dependency tree refresh; resolves the high-severity
  fast-uri path-traversal/host-confusion CVE (GHSA-q3j6-qgpj-74h6,
  GHSA-v39h-62p7-jpjc).

Five moderate findings remain, all tracing to the postcss copy
bundled inside next 16.2.x. The only fixes available today are
either next 16.3.x (not yet released as stable) or downgrading
@vercel/speed-insights to 1.0.4 (breaking change) — leaving for
the next routine bump.
@letsgogeeky letsgogeeky merged commit 0e64157 into main May 14, 2026
1 check passed
@github-actions github-actions Bot deleted the feat/AIC-126 branch May 14, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant