Skip to content

chore(deps): bump agent-runtime ^0.50 + agent-eval ^0.91#170

Merged
drewstone merged 1 commit into
mainfrom
chore/bump-substrate-0.50
Jun 14, 2026
Merged

chore(deps): bump agent-runtime ^0.50 + agent-eval ^0.91#170
drewstone merged 1 commit into
mainfrom
chore/bump-substrate-0.50

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Pure dependency bump of the Tangle substrate pins to the canonical fleet versions.

  • @tangle-network/agent-runtime: ^0.36.0^0.50.0
  • @tangle-network/agent-eval: ^0.70.0^0.91.0

Widest skew in the fleet (14 minors on runtime). The repo's evals import surface is the canonical pair {createOpenAICompatibleBackend, runAgentTaskStream} + AgentTaskSpec — non-breaking across this jump.

Verification: npm run typecheck:evals passes clean (exit 0) under the bumped pins. The one-shot judge divergence in evals/src/profiles/types.ts (EvalProfile) is intentional and left untouched. Merges cleanly into main.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — d41e69ad

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-14T00:53:09Z

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Value Audit — better-approach-exists

Verdict better-approach-exists
Concerns 1 (1 strong-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 143.6s (2 bridge agents)
Total 143.6s

💰 Value — sound

Bumps the root evals package to canonical Tangle substrate versions (^0.91 agent-eval, ^0.50 agent-runtime) and keeps the existing single-point runtime abstraction intact; typecheck:evals passes.

  • What it does: Updates package.json:38-41 and package-lock.json to pin @tangle-network/agent-eval to ^0.91.0 (from ^0.70.0) and @tangle-network/agent-runtime to ^0.50.0 (from ^0.36.0). The lockfile now resolves agent-eval 0.91.0 and agent-runtime 0.50.0, matching the newer peer-dependency ranges agent-runtime declares (@tangle-network/agent-eval >=0.83.0 <1.0.0, `@tangle-network/sandbox >=0.1.2 <0.
  • Goals it achieves: Closes the 14-minor fleet skew on agent-runtime and brings the eval harness onto the current substrate line so it receives upstream runtime/eval fixes, features, and compatible peer ranges. The repo's eval import surface (createOpenAICompatibleBackend, runAgentTaskStream, AgentTaskSpec from agent-runtime; campaign/trace APIs from agent-eval) is preserved as non-breaking across this jump.
  • Assessment: Good change. It is scoped exactly to the package that consumes these dependencies (only ./package.json references them; arena/package.json, sdk-ts/package.json, and the CJS tool package do not). The evals already centralize agent-runtime usage behind evals/src/sim/llm-call.ts:33-34, so the bump touches a single abstraction boundary rather than scattered call sites. Verification held: after `npm
  • Better / existing approach: none — this is the right approach. Searched the workspace for package.json files and agent-runtime/agent-eval imports: only the root package depends on these libraries, and only evals/src imports them. There is no duplicate dependency set to consolidate and no alternative abstraction already present that should absorb this bump. A workspace-wide bump would be wrong because the other packages do no

🎯 Usefulness — better-approach-exists

Bumps eval substrate to current runtime/eval and typechecks, but leaves the provisioned trading-agent sandbox pinned to the old versions in activate.rs, so the fleet skew the PR aims to close remains for deployed agents.

  • Integration: The bumped runtime/eval is reachable from the eval substrate: evals/src/sim/llm-call.ts:33-34 imports createOpenAICompatibleBackend, runAgentTaskStream, and AgentTaskSpec from @tangle-network/agent-runtime, and many eval modules import @tangle-network/agent-eval (e.g. evals/src/product/chat-sandbox-runner.ts:5, evals/src/trading/lifecycle-runner.ts:3, `evals/src/analysis/rlm-analys
  • Fit with existing patterns: It fits the established eval pattern: all judge/user-sim LLM calls route through the single llm-call.ts helper that wraps runtime primitives (evals/src/sim/llm-call.ts:1-34). It does not compete with any existing pattern. The local EvalProfile divergence in evals/src/profiles/types.ts:36-53 is intentional and untouched, which is consistent with the documented design.
  • Real-world viability: The runtime 0.50 surface used by this repo (createOpenAICompatibleBackend, runAgentTaskStream, AgentTaskSpec) is stable and the new optional peer deps (playwright, @tangle-network/sandbox) are optional, so install-time breakage is unlikely. Error-path handling in evals/src/sim/llm-call.ts:139-159 (backend_error events, AbortController timeout, catch-and-return) is unchanged and will co

🎯 Usefulness Audit

🔴 Root deps bumped but provisioned trading-agent sandbox stays on old runtime/eval [integration] ``

package.json:39,41 now pins ^0.91.0 / ^0.50.0, but trading-blueprint-lib/src/jobs/activate.rs:36-38 still emits ^0.70.0 / ^0.36.0 for the generated trading-agent package (activate.rs:72-94). The test trading_agent_substrate_versions_match_root_package at activate.rs:1619-1635 explicitly enforces that the generated sandbox matches root package.json; because the constants were not updated, that invariant is broken. Update TRADING_AGENT_AGENT_EVAL_VERSION to ^0.91.0 and `T


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260614T005715Z

@drewstone drewstone merged commit 2fc0ed7 into main Jun 14, 2026
13 checks passed
drewstone added a commit that referenced this pull request Jun 14, 2026
…x) + CI lane

Follow-up to #169 (model-driven trading) / #170 (deps). ONE entry point —
runTradingPersonaEval (evals/src/trading/persona-agent-eval.ts) — that degrades
by infra, instead of separate modules:

- No operator URL -> DETERMINISTIC mode: the Rust walk-forward backtest ->
  RunRecords + trace + scorecard (offline; what full-eval/CI run). Unchanged.
- operatorUrl present -> OPERATOR-MATRIX mode: runProfileMatrix sweeps the
  PROFILE axis (operator model variants: kimi-k2/glm-4.7/glm-5.1, pinned into the
  REAL operator via agentEnv) x (persona x market). Each cell runs the FULL
  operator simulation (runMultishotUserSim -> real bot_artifacts +
  tick_side_effects), judged on real artifacts (60%) + objective backtest ground
  truth (40%) — not prose. Scorecard + assertRealBackend + byProfile/byPersona
  read straight from the matrix. Multi-round honestly degenerates to 1 (the
  provision->chat->capture cycle is single-pass; turns live inside each cell).

Consolidation: folds the operator-matrix capability INTO the existing bridge file
and DELETES the standalone module + the dual --matrix bin flag + the redundant
npm script. One surface, one entry point, shared scorecard/profile/ground-truth
helpers. The bin auto-degrades by --operator-url; full-eval routes through the
same function.

CI: new 'Evals typecheck' lane (node 22 + npm ci + tsc -p evals/tsconfig.json),
classified on evals/ + package*.json + tsconfig, required in the gate.

Deps: agent-runtime ^0.52, agent-knowledge ^1.7 (over #170's ^0.50/^1.5);
agent-eval ^0.91. Validated: npm ci clean, tsc 0 errors.
drewstone added a commit that referenced this pull request Jun 14, 2026
…x) + CI lane

Follow-up to #169 (model-driven trading) / #170 (deps). ONE entry point —
runTradingPersonaEval (evals/src/trading/persona-agent-eval.ts) — that degrades
by infra, instead of separate modules:

- No operator URL -> DETERMINISTIC mode: the Rust walk-forward backtest ->
  RunRecords + trace + scorecard (offline; what full-eval/CI run). Unchanged.
- operatorUrl present -> OPERATOR-MATRIX mode: runProfileMatrix sweeps the
  PROFILE axis (operator model variants: kimi-k2/glm-4.7/glm-5.1, pinned into the
  REAL operator via agentEnv) x (persona x market). Each cell runs the FULL
  operator simulation (runMultishotUserSim -> real bot_artifacts +
  tick_side_effects), judged on real artifacts (60%) + objective backtest ground
  truth (40%) — not prose. Scorecard + assertRealBackend + byProfile/byPersona
  read straight from the matrix. Multi-round honestly degenerates to 1 (the
  provision->chat->capture cycle is single-pass; turns live inside each cell).

Consolidation: folds the operator-matrix capability INTO the existing bridge file
and DELETES the standalone module + the dual --matrix bin flag + the redundant
npm script. One surface, one entry point, shared scorecard/profile/ground-truth
helpers. The bin auto-degrades by --operator-url; full-eval routes through the
same function.

CI: new 'Evals typecheck' lane (node 22 + npm ci + tsc -p evals/tsconfig.json),
classified on evals/ + package*.json + tsconfig, required in the gate.

Deps: agent-runtime ^0.52, agent-knowledge ^1.7 (over #170's ^0.50/^1.5);
agent-eval ^0.91. Validated: npm ci clean, tsc 0 errors.
drewstone added a commit that referenced this pull request Jun 14, 2026
…x) + CI lane (#171)

Follow-up to #169 (model-driven trading) / #170 (deps). ONE entry point —
runTradingPersonaEval (evals/src/trading/persona-agent-eval.ts) — that degrades
by infra, instead of separate modules:

- No operator URL -> DETERMINISTIC mode: the Rust walk-forward backtest ->
  RunRecords + trace + scorecard (offline; what full-eval/CI run). Unchanged.
- operatorUrl present -> OPERATOR-MATRIX mode: runProfileMatrix sweeps the
  PROFILE axis (operator model variants: kimi-k2/glm-4.7/glm-5.1, pinned into the
  REAL operator via agentEnv) x (persona x market). Each cell runs the FULL
  operator simulation (runMultishotUserSim -> real bot_artifacts +
  tick_side_effects), judged on real artifacts (60%) + objective backtest ground
  truth (40%) — not prose. Scorecard + assertRealBackend + byProfile/byPersona
  read straight from the matrix. Multi-round honestly degenerates to 1 (the
  provision->chat->capture cycle is single-pass; turns live inside each cell).

Consolidation: folds the operator-matrix capability INTO the existing bridge file
and DELETES the standalone module + the dual --matrix bin flag + the redundant
npm script. One surface, one entry point, shared scorecard/profile/ground-truth
helpers. The bin auto-degrades by --operator-url; full-eval routes through the
same function.

CI: new 'Evals typecheck' lane (node 22 + npm ci + tsc -p evals/tsconfig.json),
classified on evals/ + package*.json + tsconfig, required in the gate.

Deps: agent-runtime ^0.52, agent-knowledge ^1.7 (over #170's ^0.50/^1.5);
agent-eval ^0.91. Validated: npm ci clean, tsc 0 errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants