Skip to content

fix(mcp) : file search transform and include filter#1449

Open
pramod-rudrawadi wants to merge 7 commits into
lightseekorg:mainfrom
pramod-rudrawadi:fix/mcp-file-search-transform-and-include-filter
Open

fix(mcp) : file search transform and include filter#1449
pramod-rudrawadi wants to merge 7 commits into
lightseekorg:mainfrom
pramod-rudrawadi:fix/mcp-file-search-transform-and-include-filter

Conversation

@pramod-rudrawadi
Copy link
Copy Markdown

@pramod-rudrawadi pramod-rudrawadi commented May 5, 2026

Description

Problem

This PR closes several gaps in MCP-hosted file_search behavior across routing, output shaping, and payload handling:

  • Builtin routing gap: file_search was not consistently included in builtin MCP routing/type extraction.
  • Include semantics gap: file_search_call.results needed explicit include-gating to match OpenAI include[] semantics.
  • Transformer robustness gap: file-search result parsing was too narrow for common alias/shape variants.
  • Structured payload gap: MCP tool result serialization could prefer legacy text blocks over structured payloads.

Solution

  • Add builtin file_search detection/mapping in MCP routing/type extraction.
  • Gate file_search_call.results behind IncludeField::FileSearchCallResults in streaming and non-streaming tool-loop outputs.
  • Normalize file-search parsing for alias fields and richer result content extraction.
  • Prefer CallToolResult.structured_content when converting MCP tool outputs to JSON.

Changes

  • Added file_search to builtin MCP routing/type extraction.
  • Gated file_search_call.results behind explicit include[]=file_search_call.results.
  • Applied this include filtering in both streaming and non-streaming tool-loop paths.
  • Threaded request include[] into streaming tool interception flow.
  • Improved file-search parsing to support field aliases (fileId, file_name, fileName, id, name).
  • Added filename fallback (attributes aliases → file_id) when missing.
  • Added text extraction from content[] formats (text, text.value, typed value) and ignored non-text typed nodes.
  • Preserved extra metadata (attributes, additional_properties, vector_store_id) in parsed results.
  • Updated MCP result serialization to prefer structured_content over legacy text blocks.
  • Added focused unit tests for include gating, parsing variants, and structured-content fallback.

Tests

Added/updated focused coverage for:

  • include gating behavior:
    • test_should_include_file_search_results_only_when_explicitly_requested
    • test_apply_file_search_include_filter_removes_results_when_not_included
    • test_apply_file_search_include_filter_keeps_results_when_included
  • file-search transformer behavior:
    • alias handling (id/name, fileId/file_name/fileName)
    • filename fallback behavior
    • content[] text extraction and non-text filtering
    • top-level array/result handling
  • orchestrator structured payload preference:
    • test_call_result_to_json_prefers_structured_content
    • test_call_result_to_json_falls_back_to_content_when_unstructured

Test Plan

cargo +nightly fmt --all -- --check
cargo check -p smg-mcp -p smg
cargo test -p smg-mcp transform::transformer::tests::test_file_search -- --nocapture
cargo test -p smg test_apply_file_search_include_filter_removes_results_when_not_included -- --nocapture
cargo test -p smg test_apply_file_search_include_filter_keeps_results_when_included -- --nocapture
cargo test -p smg test_should_include_file_search_results_only_when_explicitly_requested -- --nocapture

Verified behavior

  • file_search is now discovered as a builtin MCP-routable tool and mapped to BuiltinToolType::FileSearch.
  • file_search_call.results is omitted unless include explicitly contains file_search_call.results.
  • file_search_call.results is preserved when explicitly requested.
  • File-search parsing handles alias fields, filename fallback, attributes/additional properties passthrough, and content[] text extraction.
  • MCP tool-result conversion prefers structured payloads when available.
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs
Screenshot 2026-05-05 at 6 12 14 PM

Summary by CodeRabbit

  • New Features

    • File search results now support conditional filtering based on user request preferences
    • Enhanced file search response handling with improved compatibility for varied result formats
    • Added optional metadata fields to file search responses
  • Bug Fixes

    • Improved structured content handling in tool responses with fallback behavior for legacy formats

@github-actions github-actions Bot added mcp MCP related changes protocols Protocols crate changes model-gateway Model gateway crate changes openai OpenAI router changes labels May 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Warning

Rate limit exceeded

@pramod-rudrawadi has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 48 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 965d952d-512f-409f-8858-2e7005a50ec9

📥 Commits

Reviewing files that changed from the base of the PR and between ea8a1c6 and 856afc0.

📒 Files selected for processing (6)
  • crates/mcp/src/core/orchestrator.rs
  • crates/protocols/src/responses.rs
  • model_gateway/src/routers/common/mcp_utils.rs
  • model_gateway/src/routers/common/openai_bridge/transformer.rs
  • model_gateway/src/routers/openai/mcp/tool_loop.rs
  • model_gateway/src/routers/openai/responses/streaming.rs
📝 Walkthrough

Walkthrough

This PR extends MCP file-search handling across the protocol, transformation, and request/response pipeline. It adds optional metadata fields to file-search result types, refactors parsing to support field aliases and multiple input shapes, enables file-search routing as a built-in tool type, implements conditional filtering of results based on request include flags, and prefers structured content in JSON conversion.

Changes

File Search with Structured Content & Conditional Filtering

Layer / File(s) Summary
Data Shape
crates/protocols/src/responses.rs
FileSearchResult gains optional fields: vector_store_id: Option<String> and additional_properties: Option<Value>, alongside existing attributes.
Core Parsing & Transformation
crates/mcp/src/transform/transformer.rs
extract_file_results now accepts both top-level arrays and results field arrays; parse_file_result derives file_id/filename from alias keys (including fileId, id, attributes.name), extracts text from direct field or parsed content array, and optionally includes vector_store_id, attributes, and additional_properties.
Result Processing
crates/mcp/src/core/orchestrator.rs
call_result_to_json prefers structured_content when available, falling back to serialized content with error handling.
Routing & Built-in Support
model_gateway/src/routers/common/mcp_utils.rs
ResponseTool::FileSearch is mapped to BuiltinToolType::FileSearch in collect_builtin_routing and extract_builtin_types, enabling MCP server routing for file-search requests.
Streaming Execution & Filtering
model_gateway/src/routers/openai/mcp/tool_loop.rs
execute_streaming_tool_calls accepts request_include: Option<&[IncludeField]> parameter and conditionally removes results from file_search_call items when IncludeField::FileSearchCallResults is absent.
Streaming Integration
model_gateway/src/routers/openai/responses/streaming.rs
Streaming tool invocation passes original_request.include.as_deref() to execute_streaming_tool_calls.
Tests
crates/mcp/src/core/orchestrator.rs, crates/mcp/src/transform/transformer.rs, model_gateway/src/routers/common/mcp_utils.rs, model_gateway/src/routers/openai/mcp/tool_loop.rs
Tests cover structured-content preference/fallback, alias handling, content text extraction, filtering behavior, and include-flag decision logic.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Streaming as Streaming Handler
    participant Router as MCP Router
    participant Executor as Tool Executor
    participant Transform as Transformer
    participant Orch as Orchestrator

    Client->>Streaming: Send file-search request<br/>(with include flags)
    Streaming->>Router: Recognize FileSearch<br/>as built-in tool type
    Router->>Executor: Route to MCP executor<br/>(include flags passed)
    Executor->>Transform: Parse file-search results
    Transform->>Transform: Extract fields with aliases<br/>(file_id, filename,<br/>vector_store_id, etc.)
    Transform->>Executor: Return parsed FileSearchResult
    Executor->>Executor: Conditionally filter<br/>results field based<br/>on include flags
    Executor->>Orch: Convert to JSON
    Orch->>Orch: Prefer structured_content<br/>else use content
    Orch->>Streaming: Return filtered output
    Streaming->>Client: Send response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • lightseekorg/smg#730: Refactored MCP tool-loop functions that are directly modified in this PR for the request_include parameter and filtering logic.
  • lightseekorg/smg#989: Updates file-result parsing in crates/mcp/src/transform/transformer.rs, overlapping with this PR's transformer refactoring.

Suggested labels

mcp, model-gateway, protocols, openai, tests

Suggested reviewers

  • CatherineSue
  • key4ng
  • slin1237
  • zhaowenzi

🐰 Structured content now flows with grace,
Aliases bloom in parsing's place,
File search finds home in routing's fold,
Filtering flags control the load,
MCP tools march in once more bold!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: enhancing file search transformation and adding include-field filtering for file_search_call results in MCP handling.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 5, 2026

Hi @pramod-rudrawadi, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

@mergify mergify Bot added the needs-rebase PR has merge conflicts that need to be resolved label May 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the Model Context Protocol (MCP) implementation by improving the handling of structured tool results and file search outputs. Key updates include prioritizing structured_content in the orchestrator, expanding the FileSearchResult struct to include vector_store_id and additional_properties, and refining the response transformer to support various JSON schemas and aliases for file search results. Additionally, a filtering mechanism was implemented to honor OpenAI-style include parameters, ensuring file search results are only returned when explicitly requested. Comprehensive tests were added across the modified modules to verify these changes. I have no feedback to provide as there were no review comments.

Signed-off-by: Pramod Rudrawadi <pramod.rudrawadi@oracle.com>
Signed-off-by: Pramod Rudrawadi <pramod.rudrawadi@oracle.com>
Signed-off-by: Pramod Rudrawadi <pramod.rudrawadi@oracle.com>
Signed-off-by: Pramod Rudrawadi <pramod.rudrawadi@oracle.com>
…esult parsing

Signed-off-by: Pramod Rudrawadi <pramod.rudrawadi@oracle.com>
Signed-off-by: Pramod Rudrawadi <pramod.rudrawadi@oracle.com>
…vior

Signed-off-by: Pramod Rudrawadi <pramod.rudrawadi@oracle.com>
@pramod-rudrawadi pramod-rudrawadi force-pushed the fix/mcp-file-search-transform-and-include-filter branch from ea8a1c6 to 856afc0 Compare May 5, 2026 18:23
@mergify mergify Bot removed the needs-rebase PR has merge conflicts that need to be resolved label May 5, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity within 14 days. It will be automatically closed if no further activity occurs within 16 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions Bot added the stale PR has been inactive for 14+ days label May 20, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 20, 2026

Hi @pramod-rudrawadi, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

@mergify mergify Bot added the needs-rebase PR has merge conflicts that need to be resolved label May 20, 2026
@github-actions github-actions Bot removed the stale PR has been inactive for 14+ days label May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mcp MCP related changes model-gateway Model gateway crate changes needs-rebase PR has merge conflicts that need to be resolved openai OpenAI router changes protocols Protocols crate changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant