fix: use last step output instead of longest in MCP response extraction by buger · Pull Request #560 · probelabs/visor

buger · 2026-03-21T19:02:19Z

Summary

extractResponseText() in McpServerRunner was picking the longest text output across all workflow steps
For assistant workflows (route-intent → build-config → generate-response), the routing step's classification output could be longer than the final AI response
This caused MCP clients to receive intent classifications instead of actual answers
Changed to pick the last step's text output, matching the natural execution order

Root cause

The comment said "keeping the last (deepest) one" but the code did text.length > bestText.length — picking longest, not last. Fixed both the reviewSummary.history and executionStatistics.groupedResults extraction paths.

Test plan

Updated existing test descriptions to reflect "last" instead of "longest" behavior
Added test: routing output longer than final response → picks last step
Added test: grouped results with longer routing output → picks last step
All 30 mcp-server-runner tests pass
Full test suite passes (4047 tests)

🤖 Generated with Claude Code

extractResponseText() was picking the longest text across all workflow steps, which could return routing/intent classification output instead of the final generate-response output. Changed to pick the last step's text, matching the natural execution order. This fixes a bug where MCP clients would receive intent classifications (e.g. "engineering-task, skills: api-gateway") instead of the actual AI response when the routing step produced longer output than the final response. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

probelabs · 2026-03-21T19:07:54Z

PR Overview: Fix MCP Response Extraction Logic

Summary

This PR fixes a bug in McpServerRunner.extractResponseText() where the method was incorrectly selecting the longest text output from workflow steps instead of the last step's output. This caused MCP clients to receive routing/classification outputs instead of the actual AI responses in assistant workflows.

Root Cause

The code comment stated "keeping the last (deepest) one" but the implementation used text.length > bestText.length comparison, which selected the longest text rather than the last step's output. For assistant workflows with steps like route-intent → build-config → generate-response, the routing step's classification text could be longer than the final response, causing incorrect output selection.

Changes Made

`src/runners/mcp-server-runner.ts:574-627`

History extraction path: Changed from text.length > bestText.length to text.trim().length > 0, now capturing the last non-empty text output
Grouped results path: Applied same fix - now picks last text output instead of longest
Updated comments to clarify the "last step" behavior
Variable renamed from bestText to lastText for clarity

`tests/unit/runners/mcp-server-runner.test.ts`

Updated test descriptions to reflect "last" instead of "longest" behavior
Added new test: picks last step even when routing output is longer than final response - verifies the fix with a realistic scenario where route-intent produces verbose classification text
Added new test: picks last grouped result, not longest - covers the grouped results fallback path

Architecture & Impact

Component Flow

graph TD
    A[Assistant Workflow] --> B[route-intent step]
    B --> C[build-config step]
    C --> D[generate-response step]
    D --> E[reviewSummary.history]
    E --> F[McpServerRunner.extractResponseText]
    F --> G[MCP Client Response]
    
    style F fill:#f99,padding:10px
    style G fill:#9f9,padding:10px

Data Flow

sequenceDiagram
    participant W as Assistant Workflow
    participant H as reviewSummary.history
    participant E as extractResponseText()
    participant M as MCP Client
    
    W->>H: route-intent output "verbose classification"
    W->>H: build-config output
    W->>H: generate-response output "concise answer"
    H->>E: Object.entries(history) - insertion order preserved
    E->>E: Iterate steps, capture last non-empty text
    E->>M: Return generate-response text "NOT route-intent"

Affected Components

McpServerRunner: Core runner for MCP server workflows - response extraction logic fixed
Assistant workflow: All consumers using the assistant workflow (route-intent → build-config → generate-response) via MCP
reviewSummary.history: Output history structure that preserves step execution order
executionStatistics.groupedResults: Fallback data path also fixed

Scope Discovery

Direct Impact

src/runners/mcp-server-runner.ts - Core fix location
tests/unit/runners/mcp-server-runner.test.ts - Test coverage updated

Related Components (verified via search)

defaults/assistant.yaml - Defines the 3-step assistant workflow structure
src/state-machine/dispatch/history-snapshot.ts - Builds reviewSummary.history from execution journal (preserves insertion order)
src/reviewer.ts - Defines GroupedCheckResults type for grouped results
src/state-machine/runner.ts - Builds ExecutionResult with groupedResults

Entry Points

src/runners/runner-factory.ts:138-160 - Creates McpServerRunner instances
MCP server integration points that consume workflow results

Testing

All 30 mcp-server-runner tests pass
Full test suite passes (4047 tests)
New tests specifically cover the bug scenario with longer routing output

References

src/runners/mcp-server-runner.ts:574-627 - extractResponseText method (fixed)
tests/unit/runners/mcp-server-runner.test.ts:213-377 - Test coverage
defaults/assistant.yaml:539-607 - Assistant workflow step definitions
src/state-machine/dispatch/history-snapshot.ts:16-65 - History building (preserves order)
src/reviewer.ts:53-55 - GroupedCheckResults type definition

Metadata

Review Effort: 2 / 5
Primary Label: bug

Powered by Visor from Probelabs

Last updated: 2026-03-21T19:22:01.708Z | Triggered by: pr_opened | Commit: 3ec29bd

💡 TIP: You can chat with Visor using /visor ask <your question>

probelabs · 2026-03-21T19:09:17Z

Security Issues (4)

Severity	Location	Issue
🟡 Warning	`src/runners/mcp-server-runner.ts:589`	The change from 'longest' to 'last' text selection could expose intermediate routing/classification outputs if the workflow step execution order is not deterministic. If a workflow step executes out of order or if steps are added dynamically, the 'last' step might not be the intended final response, potentially exposing internal routing decisions or intermediate processing results to MCP clients. 💡 Suggestion Add validation to ensure the selected text comes from a known 'final' step (e.g., 'generate-response', 'respond') rather than blindly trusting execution order. Consider maintaining a whitelist of step suffixes that indicate final output steps.
🟡 Warning	`src/runners/mcp-server-runner.ts:589`	The trim().length > 0 check only validates that text is non-empty after trimming whitespace, but does not validate that the text is actually a user-facing response. Malicious or misconfigured workflows could inject system messages, debug output, or sensitive internal data into the text field of any step, which would then be returned to the MCP client. 💡 Suggestion Add content validation to ensure the extracted text appears to be a legitimate user-facing response (e.g., minimum length check, exclude system/debug patterns, validate against expected response format). Consider adding a sanitize step to remove potential sensitive system information.
🟡 Warning	`src/runners/mcp-server-runner.ts:605`	The grouped results extraction path also changed from 'longest' to 'last' without additional safeguards. This could expose intermediate check results or internal processing data if grouped results are not properly ordered. The same issue exists in both extraction paths (history and groupedResults). 💡 Suggestion Apply the same validation and filtering logic to both extraction paths. Consider adding a check to ensure the selected result is from a check/group that is intended for user output rather than internal processing.
🟡 Warning	`src/runners/mcp-server-runner.ts:589`	The code comment says 'iteration order preserves insertion order' but JavaScript object property iteration order is only guaranteed for string keys in ES2015+, and the behavior can be subtle with mixed key types. If workflow step IDs are not consistently ordered, the 'last' step may not be deterministic. 💡 Suggestion Use Object.entries() with explicit sorting by step name or execution timestamp to ensure deterministic selection of the final step. Alternatively, maintain an explicit step execution order in the workflow metadata.

Security Issues (4)

Severity	Location	Issue
🟡 Warning	`src/runners/mcp-server-runner.ts:589`	The change from 'longest' to 'last' text selection could expose intermediate routing/classification outputs if the workflow step execution order is not deterministic. If a workflow step executes out of order or if steps are added dynamically, the 'last' step might not be the intended final response, potentially exposing internal routing decisions or intermediate processing results to MCP clients. 💡 Suggestion Add validation to ensure the selected text comes from a known 'final' step (e.g., 'generate-response', 'respond') rather than blindly trusting execution order. Consider maintaining a whitelist of step suffixes that indicate final output steps.
🟡 Warning	`src/runners/mcp-server-runner.ts:589`	The trim().length > 0 check only validates that text is non-empty after trimming whitespace, but does not validate that the text is actually a user-facing response. Malicious or misconfigured workflows could inject system messages, debug output, or sensitive internal data into the text field of any step, which would then be returned to the MCP client. 💡 Suggestion Add content validation to ensure the extracted text appears to be a legitimate user-facing response (e.g., minimum length check, exclude system/debug patterns, validate against expected response format). Consider adding a sanitize step to remove potential sensitive system information.
🟡 Warning	`src/runners/mcp-server-runner.ts:605`	The grouped results extraction path also changed from 'longest' to 'last' without additional safeguards. This could expose intermediate check results or internal processing data if grouped results are not properly ordered. The same issue exists in both extraction paths (history and groupedResults). 💡 Suggestion Apply the same validation and filtering logic to both extraction paths. Consider adding a check to ensure the selected result is from a check/group that is intended for user output rather than internal processing.
🟡 Warning	`src/runners/mcp-server-runner.ts:589`	The code comment says 'iteration order preserves insertion order' but JavaScript object property iteration order is only guaranteed for string keys in ES2015+, and the behavior can be subtle with mixed key types. If workflow step IDs are not consistently ordered, the 'last' step may not be deterministic. 💡 Suggestion Use Object.entries() with explicit sorting by step name or execution timestamp to ensure deterministic selection of the final step. Alternatively, maintain an explicit step execution order in the workflow metadata.

\n\n ### Architecture Issues (5)

Severity	Location	Issue
🟡 Warning	`src/runners/mcp-server-runner.ts:586-593`	The fix relies on Object.entries() preserving insertion order to pick the last step's output. While this is guaranteed in modern JavaScript, it's an implicit dependency that makes the code fragile. The journal entries are already ordered by commitId (monotonic counter), but the code iterates over Object.entries(history) which depends on property insertion order. If the history object construction changes, this could break. 💡 Suggestion Consider explicitly tracking step execution order. The journal already has commitId ordering - could leverage that directly. Alternatively, could add a stepIndex or executionOrder field to history entries to make ordering explicit rather than relying on Object.entries iteration order.
🟡 Warning	`src/runners/mcp-server-runner.ts:586-593`	The trackExecution function in agent-protocol/track-execution.ts (lines 232-265) uses a different approach: it iterates Object.values(history) in REVERSE order to find the last text output. This is more explicit about the intent but inconsistent with the forward iteration used here. Both locations serve the same purpose (extracting final AI response) but use different patterns. 💡 Suggestion Standardize the text extraction logic across both locations. Either: 1) Create a shared utility function extractLastResponseText(history) that both use, or 2) Use the same iteration pattern. The reverse iteration in trackExecution is more explicit about finding the 'last' output.
🟡 Warning	`src/runners/mcp-server-runner.ts:586-593`	The comment and logic assume that 'last step = final AI response' for assistant workflows. This is a special-case assumption about workflow structure (route-intent -> build-config -> generate-response). While true for the standard assistant.yaml workflow, this may not hold for custom workflows with different structures or parallel execution. 💡 Suggestion Consider making the extraction strategy more flexible or configurable. Could add a parameter to specify which step ID to extract from (e.g., extractFromStep: 'generate-response'), or use a heuristic that looks for specific step patterns. The current 'last step wins' approach is simple but may not generalize well.
🟡 Warning	`src/runners/mcp-server-runner.ts:601-608`	The grouped results fallback (lines 601-608) also changed from 'longest' to 'last' but uses Object.values(grouped) iteration. However, groupedResults structure differs from history - it's grouped by check name, not step ID. The iteration order here may not match execution order in the same way history does. 💡 Suggestion Verify that Object.values(grouped) iteration order matches the intended execution order for grouped results. If not, consider a different approach. The groupedResults structure may need different handling than history.
🟡 Warning	`src/runners/mcp-server-runner.ts:589-591`	The condition text.trim().length > 0 filters out empty/whitespace-only strings. However, this means if the last step legitimately produces an empty response (e.g., a step that only produces side effects), the function will skip it and continue to earlier steps. This could lead to unexpected behavior. 💡 Suggestion Consider whether empty responses from the last step should be treated differently. If the last step has no text, should we return empty string or fall back to earlier steps? The current behavior (keep looking) may be correct, but it's worth documenting this edge case explicitly.

Performance Issues (1)

Severity	Location	Issue
🟠 Error	`contract:0`	Output schema validation failed: must have required property 'issues'

Powered by Visor from Probelabs

Last updated: 2026-03-21T19:15:01.145Z | Triggered by: pr_opened | Commit: 3ec29bd

💡 TIP: You can chat with Visor using /visor ask <your question>

buger merged commit 9bf6851 into main Mar 21, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use last step output instead of longest in MCP response extraction#560

fix: use last step output instead of longest in MCP response extraction#560
buger merged 1 commit intomainfrom
fix/mcp-response-extraction

buger commented Mar 21, 2026

Uh oh!

Uh oh!

probelabs Bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

probelabs Bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buger commented Mar 21, 2026

Summary

Root cause

Test plan

Uh oh!

Uh oh!

probelabs Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview: Fix MCP Response Extraction Logic

Summary

Root Cause

Changes Made

src/runners/mcp-server-runner.ts:574-627

tests/unit/runners/mcp-server-runner.test.ts

Architecture & Impact

Component Flow

Data Flow

Affected Components

Scope Discovery

Direct Impact

Related Components (verified via search)

Entry Points

Testing

References

Uh oh!

probelabs Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Issues (4)

Security Issues (4)

Performance Issues (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

probelabs Bot commented Mar 21, 2026 •

edited

Loading

`src/runners/mcp-server-runner.ts:574-627`

`tests/unit/runners/mcp-server-runner.test.ts`

probelabs Bot commented Mar 21, 2026 •

edited

Loading