You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One JudgeScore declaration (campaign/types.ts; multishot re-exports, scale
producer-defined via detectScale). CompletionVerdict + VerificationReport
extend DefaultVerdict with derived valid/score; completionVerdict() helper.
VerifiableReward carries required components. parseJudgeResponse throws
typed JudgeParseError instead of fabricating zero rows; executor and traced
ensemble record failed judges. multishot runJudge marks failed:true and the
matrix excludes failed scores from cell means. analyzeRuns failure
clustering passes the real AnalystRunInputs shape (was silently empty).
New ensembleJudge panel (cross-family gate, retry, collision-suffixed
votes) and makeEvalTools/toOpenAiTool agent toolset. Contract exports the
stable analyst entry. Version trio 0.87.0.
Copy file name to clipboardExpand all lines: clients/python/pyproject.toml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
4
4
5
5
[project]
6
6
name = "agent-eval-rpc"
7
-
version = "0.86.0"
7
+
version = "0.87.0"
8
8
description = "Python RPC client for @tangle-network/agent-eval — judge content against rubrics over HTTP or stdio RPC. Eval logic runs in the Node runtime; this package is a thin wire client."
Copy file name to clipboardExpand all lines: package.json
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
{
2
2
"name": "@tangle-network/agent-eval",
3
-
"version": "0.86.0",
3
+
"version": "0.87.0",
4
4
"description": "Evaluate and improve AI agents from runs, traces, judges, and feedback. Compare candidates, cluster failures, measure lift, and gate releases.",
0 commit comments