Skip to content

Commit dd2ed73

Browse files
authored
feat(hosted): hostedTenantFromEnv — env→HostedTenant for selfImprove (#227)
selfImprove takes a HostedTenant config; products read hosted config from env. hostedClientFromEnv already had the env→tenant logic but returns a client, not the config. Extract hostedTenantFromEnv (hostedClientFromEnv now composes it) so a product passes hostedTenant: hostedTenantFromEnv({ tenantId }) when collapsing its loop onto selfImprove — instead of hand-rolling the env map. Fails soft (undefined) when unconfigured. chore(release): 0.83.0 (lockstep). Also genericize the 0.81.0 changelog entry.
1 parent 934bd65 commit dd2ed73

7 files changed

Lines changed: 75 additions & 7 deletions

File tree

CHANGELOG.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,14 @@ All notable changes to `@tangle-network/agent-eval` and its sibling `agent-eval-
44

55
---
66

7+
## [0.83.0] — 2026-06-05 — hostedTenantFromEnv
8+
9+
### Added
10+
11+
- **`hostedTenantFromEnv` (`/hosted`).** Builds a `HostedTenant` config from env (the input `selfImprove({ hostedTenant })` and `emitLoopProvenance` take), with the same env precedence + overrides as `hostedClientFromEnv` — which now composes it. Returns `undefined` (not an error) when unconfigured, so a product wires `hostedTenant: hostedTenantFromEnv({ tenantId: 'my-agent' })` unconditionally and hosted ingest stays off until the env is set. Removes the env→tenant mapping every product would otherwise hand-roll when collapsing onto `selfImprove`.
12+
13+
---
14+
715
## [0.82.0] — 2026-06-05 — selfImprove forwards the full loop surface
816

917
### Changed
@@ -21,7 +29,7 @@ A deterministic offline test that drives `selfImprove` with a mock agent must no
2129

2230
### Added
2331

24-
- **`aggregateJudgeVerdicts<D>` (root).** Generic judge-ensemble reducer: fan out N uncorrelated judges, mean each rubric dimension over the SURVIVORS, report the inter-rater disagreement spread, sum cost. Replaces the same reduction hand-rolled in legal (`aggregateEnsemble`), creative (`production-loop/judges.ts`), and tax (`judge-ensemble.ts`). Fail-loud: a failed judge (`perDimension: null`) is recorded in `failedJudges`, never folded into a zero; all-failed throws; a failed judge's cost is still summed. Composite reuses `weightedComposite`.
32+
- **`aggregateJudgeVerdicts<D>` (root).** Generic judge-ensemble reducer: fan out N uncorrelated judges, mean each rubric dimension over the SURVIVORS, report the inter-rater disagreement spread, sum cost. Replaces the same reduction hand-rolled across multiple product agents. Fail-loud: a failed judge (`perDimension: null`) is recorded in `failedJudges`, never folded into a zero; all-failed throws; a failed judge's cost is still summed. Composite reuses `weightedComposite`.
2533
- **`createTokenRecallChecker` (root).** The deterministic, no-LLM `CorrectnessChecker` — sibling of `createLlmCorrectnessChecker`. A produced item fulfils a requirement when its content is substantive and recalls ≥ `minRecall` of the requirement title's significant tokens. The default completion gate for apps/tests without an LLM judge.
2634
- **`ErrorCluster` (root + `/analyst`).** The failure-cluster element type is now a named export, so consumers import it instead of deriving `DatasetOverview['error_clusters'][number]`.
2735

clients/python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "agent-eval-rpc"
7-
version = "0.82.0"
7+
version = "0.83.0"
88
description = "Python RPC client for @tangle-network/agent-eval — judge content against rubrics over HTTP or stdio RPC. Eval logic runs in the Node runtime; this package is a thin wire client."
99
readme = "README.md"
1010
requires-python = ">=3.10"

clients/python/src/agent_eval_rpc/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@
5858
try:
5959
__version__ = version("agent-eval-rpc")
6060
except PackageNotFoundError:
61-
__version__ = "0.82.0"
61+
__version__ = "0.83.0"
6262

6363
__all__ = [
6464
"Client",

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@tangle-network/agent-eval",
3-
"version": "0.82.0",
3+
"version": "0.83.0",
44
"description": "Evaluate and improve AI agents from runs, traces, judges, and feedback. Compare candidates, cluster failures, measure lift, and gate releases.",
55
"homepage": "https://github.com/tangle-network/agent-eval#readme",
66
"repository": {

src/hosted/client.ts

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -161,9 +161,17 @@ export function createHostedClient(tenant: HostedTenant): HostedClient {
161161
* A trailing slash on the endpoint is stripped. Pass `overrides` to supply any
162162
* field directly (e.g. a fixed `tenantId` per product) — overrides win over env.
163163
*/
164-
export function hostedClientFromEnv(
164+
/**
165+
* Build a {@link HostedTenant} config from env — the input `selfImprove`'s
166+
* `hostedTenant` and `emitLoopProvenance` take. Same env precedence + overrides
167+
* as {@link hostedClientFromEnv}; returns `undefined` (not an error) when any of
168+
* endpoint / apiKey / tenantId is missing, so a product wires
169+
* `hostedTenant: hostedTenantFromEnv({ tenantId: 'my-agent' })` unconditionally
170+
* and it stays off until the env is set.
171+
*/
172+
export function hostedTenantFromEnv(
165173
overrides: Partial<HostedTenant> & { env?: Record<string, string | undefined> } = {},
166-
): HostedClient | undefined {
174+
): HostedTenant | undefined {
167175
const env = overrides.env ?? process.env
168176
const endpoint = (
169177
overrides.endpoint ??
@@ -177,5 +185,12 @@ export function hostedClientFromEnv(
177185
if (overrides.fetchImpl) tenant.fetchImpl = overrides.fetchImpl
178186
if (overrides.timeoutMs !== undefined) tenant.timeoutMs = overrides.timeoutMs
179187
if (overrides.retries !== undefined) tenant.retries = overrides.retries
180-
return createHostedClient(tenant)
188+
return tenant
189+
}
190+
191+
export function hostedClientFromEnv(
192+
overrides: Partial<HostedTenant> & { env?: Record<string, string | undefined> } = {},
193+
): HostedClient | undefined {
194+
const tenant = hostedTenantFromEnv(overrides)
195+
return tenant ? createHostedClient(tenant) : undefined
181196
}

src/hosted/from-env.test.ts

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
/**
2+
* `hostedTenantFromEnv` is the env→HostedTenant map every product passes to
3+
* `selfImprove({ hostedTenant })`. It must fail SOFT (undefined, not throw) when
4+
* unconfigured, so a product wires it unconditionally and it stays off until the
5+
* env is set.
6+
*/
7+
8+
import { describe, expect, it } from 'vitest'
9+
10+
import { hostedClientFromEnv, hostedTenantFromEnv } from './client'
11+
12+
describe('hostedTenantFromEnv', () => {
13+
const fullEnv = {
14+
TANGLE_INGEST_URL: 'https://orchestrator.example/v1/',
15+
TANGLE_API_KEY: 'k-123',
16+
TANGLE_TENANT_ID: 'acme',
17+
}
18+
19+
it('builds a tenant from env and strips the trailing slash', () => {
20+
const t = hostedTenantFromEnv({ env: fullEnv })
21+
expect(t).toEqual({
22+
endpoint: 'https://orchestrator.example/v1',
23+
apiKey: 'k-123',
24+
tenantId: 'acme',
25+
})
26+
})
27+
28+
it('returns undefined (not an error) when any required field is missing', () => {
29+
expect(hostedTenantFromEnv({ env: {} })).toBeUndefined()
30+
expect(
31+
hostedTenantFromEnv({ env: { TANGLE_INGEST_URL: 'x', TANGLE_API_KEY: 'y' } }),
32+
).toBeUndefined()
33+
})
34+
35+
it('overrides win over env (e.g. a fixed per-product tenantId)', () => {
36+
const t = hostedTenantFromEnv({ env: fullEnv, tenantId: 'my-agent' })
37+
expect(t?.tenantId).toBe('my-agent')
38+
})
39+
40+
it('hostedClientFromEnv composes it — undefined env ⇒ no client', () => {
41+
expect(hostedClientFromEnv({ env: {} })).toBeUndefined()
42+
expect(hostedClientFromEnv({ env: fullEnv })).toBeDefined()
43+
})
44+
})

src/hosted/index.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ export {
1414
type HostedClient,
1515
type HostedTenant,
1616
hostedClientFromEnv,
17+
hostedTenantFromEnv,
1718
} from './client'
1819
export {
1920
type EvalRunCellScore,

0 commit comments

Comments
 (0)