Skip to content

fix: use last step output instead of longest in MCP response extraction#560

Merged
buger merged 1 commit intomainfrom
fix/mcp-response-extraction
Mar 21, 2026
Merged

fix: use last step output instead of longest in MCP response extraction#560
buger merged 1 commit intomainfrom
fix/mcp-response-extraction

Conversation

@buger
Copy link
Copy Markdown
Contributor

@buger buger commented Mar 21, 2026

Summary

  • extractResponseText() in McpServerRunner was picking the longest text output across all workflow steps
  • For assistant workflows (route-intent → build-config → generate-response), the routing step's classification output could be longer than the final AI response
  • This caused MCP clients to receive intent classifications instead of actual answers
  • Changed to pick the last step's text output, matching the natural execution order

Root cause

The comment said "keeping the last (deepest) one" but the code did text.length > bestText.length — picking longest, not last. Fixed both the reviewSummary.history and executionStatistics.groupedResults extraction paths.

Test plan

  • Updated existing test descriptions to reflect "last" instead of "longest" behavior
  • Added test: routing output longer than final response → picks last step
  • Added test: grouped results with longer routing output → picks last step
  • All 30 mcp-server-runner tests pass
  • Full test suite passes (4047 tests)

🤖 Generated with Claude Code

extractResponseText() was picking the longest text across all workflow
steps, which could return routing/intent classification output instead
of the final generate-response output. Changed to pick the last step's
text, matching the natural execution order.

This fixes a bug where MCP clients would receive intent classifications
(e.g. "engineering-task, skills: api-gateway") instead of the actual
AI response when the routing step produced longer output than the
final response.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@buger buger merged commit 9bf6851 into main Mar 21, 2026
9 checks passed
@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Mar 21, 2026

PR Overview: Fix MCP Response Extraction Logic

Summary

This PR fixes a bug in McpServerRunner.extractResponseText() where the method was incorrectly selecting the longest text output from workflow steps instead of the last step's output. This caused MCP clients to receive routing/classification outputs instead of the actual AI responses in assistant workflows.

Root Cause

The code comment stated "keeping the last (deepest) one" but the implementation used text.length > bestText.length comparison, which selected the longest text rather than the last step's output. For assistant workflows with steps like route-intentbuild-configgenerate-response, the routing step's classification text could be longer than the final response, causing incorrect output selection.

Changes Made

src/runners/mcp-server-runner.ts:574-627

  • History extraction path: Changed from text.length > bestText.length to text.trim().length > 0, now capturing the last non-empty text output
  • Grouped results path: Applied same fix - now picks last text output instead of longest
  • Updated comments to clarify the "last step" behavior
  • Variable renamed from bestText to lastText for clarity

tests/unit/runners/mcp-server-runner.test.ts

  • Updated test descriptions to reflect "last" instead of "longest" behavior
  • Added new test: picks last step even when routing output is longer than final response - verifies the fix with a realistic scenario where route-intent produces verbose classification text
  • Added new test: picks last grouped result, not longest - covers the grouped results fallback path

Architecture & Impact

Component Flow

graph TD
    A[Assistant Workflow] --> B[route-intent step]
    B --> C[build-config step]
    C --> D[generate-response step]
    D --> E[reviewSummary.history]
    E --> F[McpServerRunner.extractResponseText]
    F --> G[MCP Client Response]
    
    style F fill:#f99,padding:10px
    style G fill:#9f9,padding:10px
Loading

Data Flow

sequenceDiagram
    participant W as Assistant Workflow
    participant H as reviewSummary.history
    participant E as extractResponseText()
    participant M as MCP Client
    
    W->>H: route-intent output "verbose classification"
    W->>H: build-config output
    W->>H: generate-response output "concise answer"
    H->>E: Object.entries(history) - insertion order preserved
    E->>E: Iterate steps, capture last non-empty text
    E->>M: Return generate-response text "NOT route-intent"
Loading

Affected Components

  • McpServerRunner: Core runner for MCP server workflows - response extraction logic fixed
  • Assistant workflow: All consumers using the assistant workflow (route-intentbuild-configgenerate-response) via MCP
  • reviewSummary.history: Output history structure that preserves step execution order
  • executionStatistics.groupedResults: Fallback data path also fixed

Scope Discovery

Direct Impact

  • src/runners/mcp-server-runner.ts - Core fix location
  • tests/unit/runners/mcp-server-runner.test.ts - Test coverage updated

Related Components (verified via search)

  • defaults/assistant.yaml - Defines the 3-step assistant workflow structure
  • src/state-machine/dispatch/history-snapshot.ts - Builds reviewSummary.history from execution journal (preserves insertion order)
  • src/reviewer.ts - Defines GroupedCheckResults type for grouped results
  • src/state-machine/runner.ts - Builds ExecutionResult with groupedResults

Entry Points

  • src/runners/runner-factory.ts:138-160 - Creates McpServerRunner instances
  • MCP server integration points that consume workflow results

Testing

  • All 30 mcp-server-runner tests pass
  • Full test suite passes (4047 tests)
  • New tests specifically cover the bug scenario with longer routing output

References

  • src/runners/mcp-server-runner.ts:574-627 - extractResponseText method (fixed)
  • tests/unit/runners/mcp-server-runner.test.ts:213-377 - Test coverage
  • defaults/assistant.yaml:539-607 - Assistant workflow step definitions
  • src/state-machine/dispatch/history-snapshot.ts:16-65 - History building (preserves order)
  • src/reviewer.ts:53-55 - GroupedCheckResults type definition
Metadata
  • Review Effort: 2 / 5
  • Primary Label: bug

Powered by Visor from Probelabs

Last updated: 2026-03-21T19:22:01.708Z | Triggered by: pr_opened | Commit: 3ec29bd

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Mar 21, 2026

Security Issues (4)

Severity Location Issue
🟡 Warning src/runners/mcp-server-runner.ts:589
The change from 'longest' to 'last' text selection could expose intermediate routing/classification outputs if the workflow step execution order is not deterministic. If a workflow step executes out of order or if steps are added dynamically, the 'last' step might not be the intended final response, potentially exposing internal routing decisions or intermediate processing results to MCP clients.
💡 SuggestionAdd validation to ensure the selected text comes from a known 'final' step (e.g., 'generate-response', 'respond') rather than blindly trusting execution order. Consider maintaining a whitelist of step suffixes that indicate final output steps.
🟡 Warning src/runners/mcp-server-runner.ts:589
The trim().length > 0 check only validates that text is non-empty after trimming whitespace, but does not validate that the text is actually a user-facing response. Malicious or misconfigured workflows could inject system messages, debug output, or sensitive internal data into the text field of any step, which would then be returned to the MCP client.
💡 SuggestionAdd content validation to ensure the extracted text appears to be a legitimate user-facing response (e.g., minimum length check, exclude system/debug patterns, validate against expected response format). Consider adding a sanitize step to remove potential sensitive system information.
🟡 Warning src/runners/mcp-server-runner.ts:605
The grouped results extraction path also changed from 'longest' to 'last' without additional safeguards. This could expose intermediate check results or internal processing data if grouped results are not properly ordered. The same issue exists in both extraction paths (history and groupedResults).
💡 SuggestionApply the same validation and filtering logic to both extraction paths. Consider adding a check to ensure the selected result is from a check/group that is intended for user output rather than internal processing.
🟡 Warning src/runners/mcp-server-runner.ts:589
The code comment says 'iteration order preserves insertion order' but JavaScript object property iteration order is only guaranteed for string keys in ES2015+, and the behavior can be subtle with mixed key types. If workflow step IDs are not consistently ordered, the 'last' step may not be deterministic.
💡 SuggestionUse Object.entries() with explicit sorting by step name or execution timestamp to ensure deterministic selection of the final step. Alternatively, maintain an explicit step execution order in the workflow metadata.

Security Issues (4)

Severity Location Issue
🟡 Warning src/runners/mcp-server-runner.ts:589
The change from 'longest' to 'last' text selection could expose intermediate routing/classification outputs if the workflow step execution order is not deterministic. If a workflow step executes out of order or if steps are added dynamically, the 'last' step might not be the intended final response, potentially exposing internal routing decisions or intermediate processing results to MCP clients.
💡 SuggestionAdd validation to ensure the selected text comes from a known 'final' step (e.g., 'generate-response', 'respond') rather than blindly trusting execution order. Consider maintaining a whitelist of step suffixes that indicate final output steps.
🟡 Warning src/runners/mcp-server-runner.ts:589
The trim().length > 0 check only validates that text is non-empty after trimming whitespace, but does not validate that the text is actually a user-facing response. Malicious or misconfigured workflows could inject system messages, debug output, or sensitive internal data into the text field of any step, which would then be returned to the MCP client.
💡 SuggestionAdd content validation to ensure the extracted text appears to be a legitimate user-facing response (e.g., minimum length check, exclude system/debug patterns, validate against expected response format). Consider adding a sanitize step to remove potential sensitive system information.
🟡 Warning src/runners/mcp-server-runner.ts:605
The grouped results extraction path also changed from 'longest' to 'last' without additional safeguards. This could expose intermediate check results or internal processing data if grouped results are not properly ordered. The same issue exists in both extraction paths (history and groupedResults).
💡 SuggestionApply the same validation and filtering logic to both extraction paths. Consider adding a check to ensure the selected result is from a check/group that is intended for user output rather than internal processing.
🟡 Warning src/runners/mcp-server-runner.ts:589
The code comment says 'iteration order preserves insertion order' but JavaScript object property iteration order is only guaranteed for string keys in ES2015+, and the behavior can be subtle with mixed key types. If workflow step IDs are not consistently ordered, the 'last' step may not be deterministic.
💡 SuggestionUse Object.entries() with explicit sorting by step name or execution timestamp to ensure deterministic selection of the final step. Alternatively, maintain an explicit step execution order in the workflow metadata.
\n\n ### Architecture Issues (5)
Severity Location Issue
🟡 Warning src/runners/mcp-server-runner.ts:586-593
The fix relies on Object.entries() preserving insertion order to pick the last step's output. While this is guaranteed in modern JavaScript, it's an implicit dependency that makes the code fragile. The journal entries are already ordered by commitId (monotonic counter), but the code iterates over Object.entries(history) which depends on property insertion order. If the history object construction changes, this could break.
💡 SuggestionConsider explicitly tracking step execution order. The journal already has commitId ordering - could leverage that directly. Alternatively, could add a stepIndex or executionOrder field to history entries to make ordering explicit rather than relying on Object.entries iteration order.
🟡 Warning src/runners/mcp-server-runner.ts:586-593
The trackExecution function in agent-protocol/track-execution.ts (lines 232-265) uses a different approach: it iterates Object.values(history) in REVERSE order to find the last text output. This is more explicit about the intent but inconsistent with the forward iteration used here. Both locations serve the same purpose (extracting final AI response) but use different patterns.
💡 SuggestionStandardize the text extraction logic across both locations. Either: 1) Create a shared utility function extractLastResponseText(history) that both use, or 2) Use the same iteration pattern. The reverse iteration in trackExecution is more explicit about finding the 'last' output.
🟡 Warning src/runners/mcp-server-runner.ts:586-593
The comment and logic assume that 'last step = final AI response' for assistant workflows. This is a special-case assumption about workflow structure (route-intent -> build-config -> generate-response). While true for the standard assistant.yaml workflow, this may not hold for custom workflows with different structures or parallel execution.
💡 SuggestionConsider making the extraction strategy more flexible or configurable. Could add a parameter to specify which step ID to extract from (e.g., extractFromStep: 'generate-response'), or use a heuristic that looks for specific step patterns. The current 'last step wins' approach is simple but may not generalize well.
🟡 Warning src/runners/mcp-server-runner.ts:601-608
The grouped results fallback (lines 601-608) also changed from 'longest' to 'last' but uses Object.values(grouped) iteration. However, groupedResults structure differs from history - it's grouped by check name, not step ID. The iteration order here may not match execution order in the same way history does.
💡 SuggestionVerify that Object.values(grouped) iteration order matches the intended execution order for grouped results. If not, consider a different approach. The groupedResults structure may need different handling than history.
🟡 Warning src/runners/mcp-server-runner.ts:589-591
The condition text.trim().length > 0 filters out empty/whitespace-only strings. However, this means if the last step legitimately produces an empty response (e.g., a step that only produces side effects), the function will skip it and continue to earlier steps. This could lead to unexpected behavior.
💡 SuggestionConsider whether empty responses from the last step should be treated differently. If the last step has no text, should we return empty string or fall back to earlier steps? The current behavior (keep looking) may be correct, but it's worth documenting this edge case explicitly.

Performance Issues (1)

Severity Location Issue
🟠 Error contract:0
Output schema validation failed: must have required property 'issues'

Powered by Visor from Probelabs

Last updated: 2026-03-21T19:15:01.145Z | Triggered by: pr_opened | Commit: 3ec29bd

💡 TIP: You can chat with Visor using /visor ask <your question>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant