You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(core): restore evaluator messageToUser precedence, opt-in canonical tool text
The Server Tests upstream regression (planner-loop-user-facing-text →
"does not regress evaluator's explicit messageToUser path") fails on
develop because preferredFinalMessageFromToolOrModel preferred a single
successful tool's userFacingText OVER the evaluator's explicit
messageToUser. Shaw's regression test asserts the opposite: when the
evaluator emits an explicit messageToUser, it wins.
Reconciling both intents without picking one over the other: add an
opt-in flag verifiedUserFacing on ActionResult / PlannerToolResult.
Tools that emit structured outputs where evaluator paraphrase risks
hallucinating values (paths, ids, counts, numeric metrics) set
verifiedUserFacing: true to mark their userFacingText canonical. The
planner-loop then echoes the tool verbatim instead of letting the
evaluator paraphrase it. Without the flag, the evaluator's explicit
messageToUser wins (Shaw's invariant).
Precedence in preferredFinalMessageFromToolOrModel is now:
1. Single successful tool with verifiedUserFacing === true
2. Evaluator/model messageToUser
3. Most recent tool userFacingText (fallback)
4. Caller-provided fallback
Changes:
- packages/core/src/types/components.ts: add verifiedUserFacing to
ActionResult with JSDoc explaining when to opt in.
- packages/core/src/runtime/planner-types.ts: add verifiedUserFacing to
PlannerToolResult with matching contract.
- packages/core/src/runtime/execute-planned-tool-call.ts and
packages/core/src/runtime/planner-loop.ts
(actionResultToPlannerToolResult): propagate the field through both
ActionResult → PlannerToolResult conversion paths.
- packages/core/src/runtime/planner-loop.ts:
- Rename singleSuccessfulUserFacingToolResultText →
singleVerifiedUserFacingToolResultText and require
verifiedUserFacing === true.
- Reorder preferredFinalMessageFromToolOrModel to put verified-tool
first, then evaluator, then fallback chain.
- packages/core/src/__tests__/planner-happy-path.test.ts: the
"prefers a single tool's verified user-facing text over evaluator
paraphrase" test now sets verifiedUserFacing: true (its semantic
intent — "this is canonical structured data the evaluator could
hallucinate") so the canonical-output guarantee still holds.
Verified:
- 1362 tests pass, 11 skipped (full packages/core suite, 165 files)
- bun run lint:check: 12 warnings before == 12 after (no new flags)
- bun run typecheck: clean
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments