Skip to content

v0.6.0 — Model abstraction, observability & quantitative eval

Choose a tag to compare

@addiescode-sj addiescode-sj released this 01 Jun 11:47
· 7 commits to master since this release

v0.6.0 — Model abstraction, observability & quantitative eval

This release closes the "last-mile" gap of measuring, exposing, and proving the pipeline: a provider-agnostic model layer, end-to-end agent telemetry with an admin view, and content-quality eval metrics — plus a single source of truth for model ids and pricing.

✨ Added

  • Model-provider abstraction — a provider-agnostic ModelAdapter (agents/lib/model-adapter.ts) with Gemini and OpenAI implementations. The gap-analysis stage can run on OpenAI via the provider field on POST /api/analyze or the MODEL_PROVIDER env var; planning stays on Gemini to preserve the context-cache path.
  • Single model & pricing registry — model ids are centralized in agents/lib/models.ts (GEMINI_MODEL / OPENAI_MODEL, env-overridable) and per-1M-token prices in config/model-pricing.json, read by both the TS agent layer and the Python eval harness. Setting GEMINI_MODEL once switches in-process agents, app routes, and the spawned MCP skills together.
  • Agent observability — structured per-stage telemetry (latency, tokens, cache hits, retries, success/failure) emitted as JSON logs and aggregated in-process; metrics persist to a new RLS-enabled agent_runs table (written by the API route — agents never touch the DB) and surface at /admin/observability.
  • Quantitative eval — gap-analysis recall/precision vs. labeled gaps, run-to-run variance (--repeat N), regression baseline + diff (--save-baseline / --compare-baseline); a Grounding / Citation Rate with per-case source attribution; a cross-model comparison harness; and a README results-table generator.
  • Korean eval report — each agent-harness run regenerates a human-readable Korean report (documents/agent-eval-report.md); the cost methodology is documented in documents/cost-calculation.ko.md.
  • Reference-grounding tooling — the career-knowledge-base sync labels reference docs by Drive folder and returns a by_doc_type breakdown, plus a retrieval inspector to view ranked hits per query / doc_type.

🔧 Changed

  • agents/orchestrator.ts routes all LLM calls through the ModelAdapter interface instead of the inline callGemini helper (context-cache logic moved into the Gemini adapter unchanged); runCareerAnalysis gained optional provider and onMetric parameters.
  • Admin access control — added an is_admin flag + SECURITY DEFINER public.is_admin() helper; /admin/observability and its route now require the admin role, and agent_runs reads are admin-only via RLS.

Full changelog: see CHANGELOG.md. Compare: v0.5.2...v0.6.0