Skip to content

Commit 542bc57

Browse files
addiescode-sjclaude
andcommitted
chore(release): 0.6.0
- model-provider abstraction (Gemini/OpenAI) + single model/pricing registry - agent observability (telemetry, agent_runs, /admin) with admin-role gating - quantitative eval (recall/precision, grounding, baseline/variance) + Korean agent-eval report; reference-doc labeling fix and retrieval inspector Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ad9b075 commit 542bc57

1 file changed

Lines changed: 3 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1414
- **Model-provider abstraction**: a provider-agnostic `ModelAdapter` ([agents/lib/model-adapter.ts](agents/lib/model-adapter.ts)) with Gemini and OpenAI implementations. The gap-analysis stage can now run on OpenAI via the `provider` field on `POST /api/analyze` or the `MODEL_PROVIDER` env var; planning stays on Gemini to preserve the context-cache path. Documented in the new README "Model Strategy & Trade-offs" section.
1515
- **Agent observability**: structured per-stage telemetry (latency, token usage, cache hits, retries, success/failure) emitted as JSON logs and aggregated in-process at the orchestrator boundary ([agents/lib/observability.ts](agents/lib/observability.ts)). Per-run metrics persist to a new RLS-enabled `agent_runs` table (written by the analyze route — agents never touch the DB) and are surfaced at `/admin/observability` (avg/p95 latency, failure rate, retry rate, cache-hit rate, tokens, estimated cost, cache-token savings).
1616
- **Quantitative eval**: gap-analysis **recall/precision vs. labeled gaps**, run-to-run **variance** (`--repeat N`), and a **regression baseline + diff** (`--save-baseline` / `--compare-baseline`) in [eval/agent_harness.py](eval/agent_harness.py); a **Grounding / Citation Rate** with per-case source attribution in [eval/ragas_eval.py](eval/ragas_eval.py) (written to `eval/grounding_results.csv`); a **cross-model comparison** harness ([eval/model_comparison.py](eval/model_comparison.py)); and a README results-table generator ([eval/render_results.py](eval/render_results.py)) that renders the committed eval CSVs into the new "Quality & Evaluation" section.
17+
- **Korean eval report**: each agent-harness run regenerates a human-readable Korean report at [documents/agent-eval-report.md](documents/agent-eval-report.md) (aggregate + per-fixture tables + metric glossary); the cost methodology is documented in [documents/cost-calculation.ko.md](documents/cost-calculation.ko.md).
18+
- **Reference-grounding tooling**: the `career-knowledge-base` sync now labels reference docs by Drive folder and surfaces a `by_doc_type` breakdown, plus a retrieval inspector ([mcp-skills/career-knowledge-base/scripts/inspect-retrieval.mjs](mcp-skills/career-knowledge-base/scripts/inspect-retrieval.mjs)) to view ranked hits per query/`doc_type`.
1719

1820
### Changed
1921

2022
- [agents/orchestrator.ts](agents/orchestrator.ts) routes all LLM calls through the `ModelAdapter` interface instead of the inline `callGemini` helper; the context-cache logic moved into the Gemini adapter unchanged. `runCareerAnalysis` gained optional `provider` and `onMetric` parameters.
23+
- **Single model/pricing source**: model ids are centralized in [agents/lib/models.ts](agents/lib/models.ts) (`GEMINI_MODEL` / `OPENAI_MODEL`, env-overridable) and per-1M-token prices in [config/model-pricing.json](config/model-pricing.json), read by both the TS agent layer and the Python eval harness — removing the duplicated model-string/price literals previously scattered across agents, app routes, and MCP skills.
2124
- **Admin access control**: added an `is_admin` flag to `profiles` and a SECURITY DEFINER `public.is_admin()` helper (migration `20260601000003_add_profile_admin_role`). The `/admin/observability` page and `/api/admin/observability` route now require the admin role (non-admins are redirected / get 403) instead of merely being signed in, and `agent_runs` reads are restricted to admins via RLS. Admin accounts are granted manually in Supabase.
2225

2326
---

0 commit comments

Comments
 (0)