v0.6.0 — Model abstraction, observability & quantitative eval
v0.6.0 — Model abstraction, observability & quantitative eval
This release closes the "last-mile" gap of measuring, exposing, and proving the pipeline: a provider-agnostic model layer, end-to-end agent telemetry with an admin view, and content-quality eval metrics — plus a single source of truth for model ids and pricing.
✨ Added
- Model-provider abstraction — a provider-agnostic
ModelAdapter(agents/lib/model-adapter.ts) with Gemini and OpenAI implementations. The gap-analysis stage can run on OpenAI via theproviderfield onPOST /api/analyzeor theMODEL_PROVIDERenv var; planning stays on Gemini to preserve the context-cache path. - Single model & pricing registry — model ids are centralized in
agents/lib/models.ts(GEMINI_MODEL/OPENAI_MODEL, env-overridable) and per-1M-token prices inconfig/model-pricing.json, read by both the TS agent layer and the Python eval harness. SettingGEMINI_MODELonce switches in-process agents, app routes, and the spawned MCP skills together. - Agent observability — structured per-stage telemetry (latency, tokens, cache hits, retries, success/failure) emitted as JSON logs and aggregated in-process; metrics persist to a new RLS-enabled
agent_runstable (written by the API route — agents never touch the DB) and surface at/admin/observability. - Quantitative eval — gap-analysis recall/precision vs. labeled gaps, run-to-run variance (
--repeat N), regression baseline + diff (--save-baseline/--compare-baseline); a Grounding / Citation Rate with per-case source attribution; a cross-model comparison harness; and a README results-table generator. - Korean eval report — each agent-harness run regenerates a human-readable Korean report (
documents/agent-eval-report.md); the cost methodology is documented indocuments/cost-calculation.ko.md. - Reference-grounding tooling — the
career-knowledge-basesync labels reference docs by Drive folder and returns aby_doc_typebreakdown, plus a retrieval inspector to view ranked hits per query /doc_type.
🔧 Changed
agents/orchestrator.tsroutes all LLM calls through theModelAdapterinterface instead of the inlinecallGeminihelper (context-cache logic moved into the Gemini adapter unchanged);runCareerAnalysisgained optionalproviderandonMetricparameters.- Admin access control — added an
is_adminflag + SECURITY DEFINERpublic.is_admin()helper;/admin/observabilityand its route now require the admin role, andagent_runsreads are admin-only via RLS.
Full changelog: see CHANGELOG.md. Compare: v0.5.2...v0.6.0