fix(core): restore evaluator messageToUser precedence, opt-in canonical tool text by NubsCarson · Pull Request #7897 · elizaOS/eliza

NubsCarson · 2026-05-23T10:55:36Z

What

The planner-loop-user-facing-text → "does not regress evaluator's explicit messageToUser path" test fails on develop because preferredFinalMessageFromToolOrModel prefers a single successful tool's userFacingText over the evaluator's explicit messageToUser. The original Shaw test asserts the opposite: when the evaluator emits an explicit messageToUser, it wins.

A later commit (4ba5130529 fix(core): preserve single-tool user-facing output) inverted the precedence to prevent the evaluator from paraphrasing a tool's structured output and hallucinating values (paths/ids/numeric metrics). That intent is real and worth preserving — its test (planner-happy-path.test.ts → "prefers a single tool's verified user-facing text over evaluator paraphrase") catches a genuine class of bug. So the two tests are direct semantic opposites, both with valid intent.

How

Reconciles both by adding a one-field opt-in: verifiedUserFacing?: boolean on ActionResult / PlannerToolResult. Tools whose output is canonical (do-not-paraphrase) — typically structured data the evaluator could hallucinate — set the flag. Without it, the evaluator wins (Shaw's invariant).

Precedence in preferredFinalMessageFromToolOrModel is now:

Single successful tool with verifiedUserFacing === true
Evaluator / model messageToUser
Most recent tool userFacingText (fallback)
Caller-provided fallback

Changes

packages/core/src/types/components.ts — add verifiedUserFacing to ActionResult with JSDoc explaining when to opt in.
packages/core/src/runtime/planner-types.ts — add matching verifiedUserFacing to PlannerToolResult.
packages/core/src/runtime/execute-planned-tool-call.ts and packages/core/src/runtime/planner-loop.ts (actionResultToPlannerToolResult) — propagate the field through both ActionResult → PlannerToolResult conversion paths.
packages/core/src/runtime/planner-loop.ts:
- Rename singleSuccessfulUserFacingToolResultText → singleVerifiedUserFacingToolResultText and require verifiedUserFacing === true.
- Reorder preferredFinalMessageFromToolOrModel to put verified-tool first.
packages/core/src/__tests__/planner-happy-path.test.ts — the conflicting test now sets verifiedUserFacing: true (matching its stated semantic intent), so the canonical-output guarantee still holds.

Verified

bun run test (full packages/core): 1362 passed, 11 skipped across 165 test files.
bun run lint:check: 12 warnings before == 12 after (zero new flags).
bun run typecheck: clean.

Impact

Any tool currently relying on the post-4ba5130529 "tool always wins" precedence without setting verifiedUserFacing: true will now see the evaluator's explicit messageToUser take precedence. To restore the old behavior, set verifiedUserFacing: true on the action handler's ActionResult. The default change matches the long-standing contract in userFacingText's JSDoc ("the planner-loop's terminal-FINISH fallback may use this") and Shaw's pre-existing regression coverage.

🤖 Generated with Claude Code

Greptile Summary

This PR resolves a test conflict between two valid but opposing invariants: Shaw's rule that the evaluator's explicit messageToUser wins, and the anti-hallucination rule that a tool's structured output should not be paraphrased. The fix introduces verifiedUserFacing?: boolean as an opt-in field on ActionResult / PlannerToolResult that tool authors set when their output is canonical.

preferredFinalMessageFromToolOrModel now uses a three-tier precedence: (1) single verified tool, (2) evaluator messageToUser, (3) latest tool fallback — restoring the evaluator-wins default while preserving the anti-paraphrase guarantee for opted-in tools.
The conflicting planner-happy-path test is fixed by adding verifiedUserFacing: true to the CHECK_RUNTIME mock, making its intent match its mechanism.
The new field is propagated through both actionResultToPlannerToolResult and actionResultToStreamingResult conversion paths.

Confidence Score: 4/5

Safe to merge; the behavioral change is intentional and well-documented, and both regression tests pass.

The change correctly restores evaluator precedence as the default and the opt-in verifiedUserFacing flag works as designed. The single-step constraint in singleVerifiedUserFacingToolResultText means a tool that sets verifiedUserFacing: true in a multi-call trajectory will have its flag silently ignored — a confusing footgun for tool authors, though it is documented in the JSDoc.

The precedence logic in planner-loop.ts around singleVerifiedUserFacingToolResultText is the most sensitive area, specifically the interaction between the "exactly one result step" constraint and multi-step plans where verifiedUserFacing is set.

Important Files Changed

Filename	Overview
packages/core/src/runtime/planner-loop.ts	Renames `singleSuccessfulUserFacingToolResultText` to `singleVerifiedUserFacingToolResultText`, adds `verifiedUserFacing` gate, reorders `preferredFinalMessageFromToolOrModel` precedence so evaluator wins by default unless the tool opts-in; both semantics are covered by existing tests
packages/core/src/runtime/planner-types.ts	Adds `verifiedUserFacing?: boolean` to `PlannerToolResult` with clear JSDoc; no logic changes
packages/core/src/types/components.ts	Adds `verifiedUserFacing?: boolean` to `ActionResult` with JSDoc; mirrors the PlannerToolResult addition cleanly
packages/core/src/runtime/execute-planned-tool-call.ts	Propagates `verifiedUserFacing` through `actionResultToStreamingResult` to the streaming hook; straightforward one-line addition
packages/core/src/tests/planner-happy-path.test.ts	Adds `verifiedUserFacing: true` to the CHECK_RUNTIME mock action to restore the "tool wins" semantic; the test's stated intent now matches its mechanism

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[preferredFinalMessageFromToolOrModel] --> B{Single tool step\nwith result?}
    B -- No --> D
    B -- Yes --> C{verifiedUserFacing\n=== true?}
    C -- No --> D{evaluator\nmessageToUser set?}
    C -- Yes --> G{success === true\nAND userFacingText set?}
    G -- Yes --> H[Return tool userFacingText\ncanonical / do-not-paraphrase]
    G -- No --> D
    D -- Yes --> E[Return evaluator messageToUser\nevaluator wins by default]
    D -- No --> F{Any tool step has\nuserFacingText?}
    F -- Yes --> I[Return latest tool\nuserFacingText fallback]
    F -- No --> J[Return caller fallback]

_{Reviews (1): Last reviewed commit: "fix(core): restore evaluator messageToUs..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

…al tool text The Server Tests upstream regression (planner-loop-user-facing-text → "does not regress evaluator's explicit messageToUser path") fails on develop because preferredFinalMessageFromToolOrModel preferred a single successful tool's userFacingText OVER the evaluator's explicit messageToUser. Shaw's regression test asserts the opposite: when the evaluator emits an explicit messageToUser, it wins. Reconciling both intents without picking one over the other: add an opt-in flag verifiedUserFacing on ActionResult / PlannerToolResult. Tools that emit structured outputs where evaluator paraphrase risks hallucinating values (paths, ids, counts, numeric metrics) set verifiedUserFacing: true to mark their userFacingText canonical. The planner-loop then echoes the tool verbatim instead of letting the evaluator paraphrase it. Without the flag, the evaluator's explicit messageToUser wins (Shaw's invariant). Precedence in preferredFinalMessageFromToolOrModel is now: 1. Single successful tool with verifiedUserFacing === true 2. Evaluator/model messageToUser 3. Most recent tool userFacingText (fallback) 4. Caller-provided fallback Changes: - packages/core/src/types/components.ts: add verifiedUserFacing to ActionResult with JSDoc explaining when to opt in. - packages/core/src/runtime/planner-types.ts: add verifiedUserFacing to PlannerToolResult with matching contract. - packages/core/src/runtime/execute-planned-tool-call.ts and packages/core/src/runtime/planner-loop.ts (actionResultToPlannerToolResult): propagate the field through both ActionResult → PlannerToolResult conversion paths. - packages/core/src/runtime/planner-loop.ts: - Rename singleSuccessfulUserFacingToolResultText → singleVerifiedUserFacingToolResultText and require verifiedUserFacing === true. - Reorder preferredFinalMessageFromToolOrModel to put verified-tool first, then evaluator, then fallback chain. - packages/core/src/__tests__/planner-happy-path.test.ts: the "prefers a single tool's verified user-facing text over evaluator paraphrase" test now sets verifiedUserFacing: true (its semantic intent — "this is canonical structured data the evaluator could hallucinate") so the canonical-output guarantee still holds. Verified: - 1362 tests pass, 11 skipped (full packages/core suite, 165 files) - bun run lint:check: 12 warnings before == 12 after (no new flags) - bun run typecheck: clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-23T10:55:43Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0818d598-0240-46ae-8483-9b12e1652ddd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/planner-loop-evaluator-precedence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-05-23T10:56:26Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

github-actions · 2026-05-23T10:58:27Z

LifeOps Benchmark — `eliza`

Run ID: lifeops-eliza-26330862686

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.000
pass@k: 0.000
Total cost: $0.0000

Full artifacts: see the lifeops-run-eliza-26330862686 upload on this run.

greptile-apps · 2026-05-23T11:00:06Z

@@ -2187,6 +2196,7 @@ function singleSuccessfulUserFacingToolResultText(
 	if (toolResultSteps.length !== 1) return undefined;
 	const result = toolResultSteps[0]?.result;
 	if (result?.success !== true) return undefined;
+	if (result.verifiedUserFacing !== true) return undefined;
 	const text = result.userFacingText?.trim();
 	return text || undefined;


Silent no-op when verifiedUserFacing meets multi-tool trajectories

singleVerifiedUserFacingToolResultText gates on toolResultSteps.length !== 1, where toolResultSteps includes every step that has both toolCall and result — failed steps included. A plan that calls two tools (even if the first failed and only the second succeeded with verifiedUserFacing: true) will silently fall through to the evaluator's messageToUser. Tool authors who set verifiedUserFacing: true expecting their canonical output to survive multi-step plans will see the flag silently ignored with no diagnostic path to discover why. The JSDoc does say "exactly one successful tool", but "exactly one result step" and "exactly one successful result step" differ and the comment doesn't surface that distinction.

github-actions · 2026-05-23T11:00:33Z

LifeOps Benchmark — `hermes`

Run ID: lifeops-hermes-26330862686

LifeOps Benchmark

Model: gpt-oss-120b
Judge: claude-opus-4-7
Scenarios: 25
pass@1: 0.240
pass@k: 0.240
Total cost: $0.9444

Full artifacts: see the lifeops-run-hermes-26330862686 upload on this run.

github-actions Bot added the Tests label May 23, 2026

greptile-apps Bot reviewed May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): restore evaluator messageToUser precedence, opt-in canonical tool text#7897

fix(core): restore evaluator messageToUser precedence, opt-in canonical tool text#7897
NubsCarson wants to merge 1 commit into
developfrom
fix/planner-loop-evaluator-precedence

NubsCarson commented May 23, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

coderabbitai Bot commented May 23, 2026

Review skipped

Uh oh!

claude Bot commented May 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

greptile-apps Bot May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NubsCarson commented May 23, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Changes

Verified

Impact

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

coderabbitai Bot commented May 23, 2026

Review skipped

Uh oh!

claude Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 23, 2026

LifeOps Benchmark — eliza

LifeOps Benchmark

Uh oh!

greptile-apps Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 23, 2026

LifeOps Benchmark — hermes

LifeOps Benchmark

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NubsCarson commented May 23, 2026 •

edited by greptile-apps Bot

Loading

claude Bot commented May 23, 2026 •

edited

Loading

LifeOps Benchmark — `eliza`

LifeOps Benchmark — `hermes`