Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Aug 1, 2025

feat: Add get_stream_previews MCP tool

Summary

Adds a new get_stream_previews MCP tool to PyAirbyte that wraps the recently added get_samples() functionality from the Source class. This tool provides stream preview capabilities through the MCP interface, allowing users to retrieve sample records from source connectors.

Key features:

  • Forces on_error="ignore" for clean MCP responses as requested
  • Passes through limit and streams parameters from get_samples()
  • Returns descriptive error messages for failed streams instead of None
  • Follows existing MCP tool patterns with proper type annotations
  • Converts InMemoryDataset objects to serializable lists of dictionaries

Review & Testing Checklist for Human

  • Test with actual connectors: Verify the tool works end-to-end with real source connectors and configurations
  • Validate error handling: Test both stream-level errors (should return descriptive error messages) and function-level errors (should return structured error info)
  • Check MCP integration: Confirm the tool is properly registered and accessible through the MCP interface
  • Verify return format: Ensure the returned dictionary structure is suitable for MCP clients and matches expectations
  • Test parameter passthrough: Verify that streams (including "*", lists, and single strings) and limit parameters work correctly

Recommended test plan: Test with a simple connector like source-faker, try different streams parameter values ("*", specific stream names, lists), and verify both success and error scenarios.


Diagram

%%{ init : { "theme" : "default" }}%%
flowchart TD
    subgraph "airbyte/mcp/"
        LocalOps["_local_ops.py"]:::major-edit
    end
    
    subgraph "airbyte/sources/"
        SourceBase["base.py (Source class)"]:::context
    end
    
    subgraph "MCP Framework"
        FastMCP["FastMCP"]:::context
    end
    
    LocalOps -->|"wraps get_samples()"| SourceBase
    LocalOps -->|"registers with"| FastMCP
    
    LocalOps -.- NewTool["get_stream_previews():<br/>- Forces on_error='ignore'<br/>- Returns descriptive errors<br/>- Converts InMemoryDataset to lists"]:::major-edit
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit  
        L3["Context/No Edit"]:::context
    end

    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • Type checking: Required several iterations to resolve mypy type compatibility issues with the streams parameter handling
  • Environment limitations: Unable to fully test implementation locally due to dependency issues, so thorough testing with actual connectors is especially important
  • User requirements: Implemented specific guidance from AJ Steers to force on_error="ignore" and return descriptive error messages instead of None values

Session details: Requested by AJ Steers (@aaronsteers) - Devin session

Summary by CodeRabbit

  • New Features
    • Added the ability to preview sample records from streams in a source connector, with options to select specific streams and set the number of records per stream.

Important

Auto-merge enabled.

This PR is set to merge automatically when all requirements are met.

- Wraps the recently added get_samples() functionality from Source class
- Follows existing MCP tool patterns with proper type annotations
- Forces on_error='ignore' as recommended for clean MCP responses
- Supports streams and limit parameters with passthrough
- Returns serializable dict mapping stream names to sample records
- Returns descriptive error messages for failed streams instead of None

Co-Authored-By: AJ Steers <[email protected]>
@Copilot Copilot AI review requested due to automatic review settings August 1, 2025 16:57
Copy link
Contributor

Original prompt from AJ Steers
@Devin - Add "get_stream_previews" MCP tool in PyAirbyte. Wrapping the new functionality recently added.

Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

github-actions bot commented Aug 1, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1754065680-get-stream-previews-mcp-tool' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1754065680-get-stream-previews-mcp-tool'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new get_stream_previews MCP tool that provides sample record retrieval functionality through the MCP interface by wrapping the get_samples() method from the Source class. The tool enables users to preview stream data from source connectors with configurable limits and stream selection.

  • Adds get_stream_previews function with comprehensive parameter handling for streams and limits
  • Implements error handling that returns descriptive messages instead of None values
  • Registers the new tool with the MCP framework alongside existing tools

Copy link
Contributor

coderabbitai bot commented Aug 1, 2025

📝 Walkthrough

Walkthrough

A new function, get_stream_previews, was introduced to retrieve sample records from streams in a specified source connector. The function handles connector instantiation, configuration, stream selection (with wildcard support), sample retrieval, and error reporting. It is also registered as a tool in the local operations toolset.

Changes

Cohort / File(s) Change Summary
Sample Records Retrieval Function
airbyte/mcp/_local_ops.py
Added get_stream_previews function to fetch sample records from source connector streams, with error handling and wildcard stream support. Registered the function as a tool.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LocalOps
    participant SourceConnector

    User->>LocalOps: get_stream_previews(params)
    LocalOps->>SourceConnector: Instantiate and configure
    loop For each stream
        LocalOps->>SourceConnector: get_samples(stream, limit, error_handling="ignore")
        SourceConnector-->>LocalOps: Sample records or error
    end
    LocalOps-->>User: Return dict of stream previews or errors
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Would you like to see suggestions for additional test coverage or documentation for this new function, wdyt?


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b49d11 and 29091f1.

📒 Files selected for processing (1)
  • airbyte/mcp/_local_ops.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte/mcp/_local_ops.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1754065680-get-stream-previews-mcp-tool

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte/mcp/_local_ops.py (3)

237-268: Function signature looks solid!

The type annotations and parameter descriptions are well-crafted. One small suggestion: the streams parameter description mentions "or None for selected streams" - might be clearer to say "or None for default stream selection", wdyt?

The return type annotation correctly captures that stream values can be either record lists or error strings.


281-289: Stream parameter normalization looks good!

The logic correctly handles all the expected input formats. One small consideration: should we handle empty strings differently? Currently streams="" would become [""] which might not be intended. Perhaps add a check like elif isinstance(streams, str) and streams != "*" and streams.strip(): to filter out empty/whitespace strings, wdyt?

The type annotation with Literal["*"] is a nice touch for clarity.


269-312: Solid implementation that follows established patterns!

The core logic looks great - proper source instantiation, config resolution, and the required on_error="ignore" parameter. The error handling is comprehensive and consistent with other functions in the file.

One question: the list(dataset) conversion on line 304 assumes the dataset is iterable. Based on the PR description, this converts InMemoryDataset objects to lists - should we add a comment explaining this conversion for future maintainers, wdyt?

The descriptive error messages for failed streams (lines 300-302) are a nice touch per the PR requirements.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6189e9c and 86419b7.

📒 Files selected for processing (1)
  • airbyte/mcp/_local_ops.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
airbyte/mcp/_local_ops.py (2)

8-8: LGTM on the import addition!

The Literal import is necessary for the type annotation on line 281. Good practice to keep imports organized.


499-499: Perfect function registration!

Great job adding the function to the tools list in alphabetical order - maintains consistency with the existing pattern.

Copy link

github-actions bot commented Aug 1, 2025

PyTest Results (Fast Tests Only, No Creds)

301 tests  ±0   301 ✅ ±0   4m 12s ⏱️ -2s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 29091f1. ± Comparison against base commit 6189e9c.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Aug 1, 2025

PyTest Results (Full)

364 tests  ±0   350 ✅ ±0   21m 4s ⏱️ -9s
  1 suites ±0    14 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 29091f1. ± Comparison against base commit 6189e9c.

♻️ This comment has been updated with latest results.

…reviews

- Address feedback from aaronsteers to follow existing naming convention
- Update parameter name, function call, and error message consistently
- Maintains backward compatibility as this is a new function

Co-Authored-By: AJ Steers <[email protected]>
- Address feedback from aaronsteers to increase default sample size
- Allows LLMs to work with more data for better analysis and summaries
- Users can still specify smaller limits if needed

Co-Authored-By: AJ Steers <[email protected]>
- Address feedback from aaronsteers to reduce try block scope
- Only wrap get_samples() call in exception handling
- Let source creation and config resolution raise exceptions naturally
- Maintains same error handling for the critical get_samples operation

Co-Authored-By: AJ Steers <[email protected]>
@aaronsteers aaronsteers enabled auto-merge (squash) August 1, 2025 17:59
@aaronsteers aaronsteers merged commit 9c3245f into main Aug 1, 2025
22 checks passed
@aaronsteers aaronsteers deleted the devin/1754065680-get-stream-previews-mcp-tool branch August 1, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant