You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(hosted): hostedTenantFromEnv — env→HostedTenant for selfImprove (#227)
selfImprove takes a HostedTenant config; products read hosted config from env.
hostedClientFromEnv already had the env→tenant logic but returns a client, not
the config. Extract hostedTenantFromEnv (hostedClientFromEnv now composes it) so
a product passes hostedTenant: hostedTenantFromEnv({ tenantId }) when collapsing
its loop onto selfImprove — instead of hand-rolling the env map. Fails soft
(undefined) when unconfigured.
chore(release): 0.83.0 (lockstep). Also genericize the 0.81.0 changelog entry.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+9-1Lines changed: 9 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,14 @@ All notable changes to `@tangle-network/agent-eval` and its sibling `agent-eval-
4
4
5
5
---
6
6
7
+
## [0.83.0] — 2026-06-05 — hostedTenantFromEnv
8
+
9
+
### Added
10
+
11
+
-**`hostedTenantFromEnv` (`/hosted`).** Builds a `HostedTenant` config from env (the input `selfImprove({ hostedTenant })` and `emitLoopProvenance` take), with the same env precedence + overrides as `hostedClientFromEnv` — which now composes it. Returns `undefined` (not an error) when unconfigured, so a product wires `hostedTenant: hostedTenantFromEnv({ tenantId: 'my-agent' })` unconditionally and hosted ingest stays off until the env is set. Removes the env→tenant mapping every product would otherwise hand-roll when collapsing onto `selfImprove`.
12
+
13
+
---
14
+
7
15
## [0.82.0] — 2026-06-05 — selfImprove forwards the full loop surface
8
16
9
17
### Changed
@@ -21,7 +29,7 @@ A deterministic offline test that drives `selfImprove` with a mock agent must no
21
29
22
30
### Added
23
31
24
-
-**`aggregateJudgeVerdicts<D>` (root).** Generic judge-ensemble reducer: fan out N uncorrelated judges, mean each rubric dimension over the SURVIVORS, report the inter-rater disagreement spread, sum cost. Replaces the same reduction hand-rolled in legal (`aggregateEnsemble`), creative (`production-loop/judges.ts`), and tax (`judge-ensemble.ts`). Fail-loud: a failed judge (`perDimension: null`) is recorded in `failedJudges`, never folded into a zero; all-failed throws; a failed judge's cost is still summed. Composite reuses `weightedComposite`.
32
+
-**`aggregateJudgeVerdicts<D>` (root).** Generic judge-ensemble reducer: fan out N uncorrelated judges, mean each rubric dimension over the SURVIVORS, report the inter-rater disagreement spread, sum cost. Replaces the same reduction hand-rolled across multiple product agents. Fail-loud: a failed judge (`perDimension: null`) is recorded in `failedJudges`, never folded into a zero; all-failed throws; a failed judge's cost is still summed. Composite reuses `weightedComposite`.
25
33
-**`createTokenRecallChecker` (root).** The deterministic, no-LLM `CorrectnessChecker` — sibling of `createLlmCorrectnessChecker`. A produced item fulfils a requirement when its content is substantive and recalls ≥ `minRecall` of the requirement title's significant tokens. The default completion gate for apps/tests without an LLM judge.
26
34
-**`ErrorCluster` (root + `/analyst`).** The failure-cluster element type is now a named export, so consumers import it instead of deriving `DatasetOverview['error_clusters'][number]`.
Copy file name to clipboardExpand all lines: clients/python/pyproject.toml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
4
4
5
5
[project]
6
6
name = "agent-eval-rpc"
7
-
version = "0.82.0"
7
+
version = "0.83.0"
8
8
description = "Python RPC client for @tangle-network/agent-eval — judge content against rubrics over HTTP or stdio RPC. Eval logic runs in the Node runtime; this package is a thin wire client."
Copy file name to clipboardExpand all lines: package.json
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
{
2
2
"name": "@tangle-network/agent-eval",
3
-
"version": "0.82.0",
3
+
"version": "0.83.0",
4
4
"description": "Evaluate and improve AI agents from runs, traces, judges, and feedback. Compare candidates, cluster failures, measure lift, and gate releases.",
0 commit comments