-
Notifications
You must be signed in to change notification settings - Fork 67
feat: Add get_stream_previews MCP tool #738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add get_stream_previews MCP tool #738
Conversation
- Wraps the recently added get_samples() functionality from Source class - Follows existing MCP tool patterns with proper type annotations - Forces on_error='ignore' as recommended for clean MCP responses - Supports streams and limit parameters with passthrough - Returns serializable dict mapping stream names to sample records - Returns descriptive error messages for failed streams instead of None Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1754065680-get-stream-previews-mcp-tool' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1754065680-get-stream-previews-mcp-tool' Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new get_stream_previews
MCP tool that provides sample record retrieval functionality through the MCP interface by wrapping the get_samples()
method from the Source class. The tool enables users to preview stream data from source connectors with configurable limits and stream selection.
- Adds
get_stream_previews
function with comprehensive parameter handling for streams and limits - Implements error handling that returns descriptive messages instead of None values
- Registers the new tool with the MCP framework alongside existing tools
📝 WalkthroughWalkthroughA new function, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant LocalOps
participant SourceConnector
User->>LocalOps: get_stream_previews(params)
LocalOps->>SourceConnector: Instantiate and configure
loop For each stream
LocalOps->>SourceConnector: get_samples(stream, limit, error_handling="ignore")
SourceConnector-->>LocalOps: Sample records or error
end
LocalOps-->>User: Return dict of stream previews or errors
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Would you like to see suggestions for additional test coverage or documentation for this new function, wdyt? 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte/mcp/_local_ops.py (3)
237-268
: Function signature looks solid!The type annotations and parameter descriptions are well-crafted. One small suggestion: the
streams
parameter description mentions "or None for selected streams" - might be clearer to say "or None for default stream selection", wdyt?The return type annotation correctly captures that stream values can be either record lists or error strings.
281-289
: Stream parameter normalization looks good!The logic correctly handles all the expected input formats. One small consideration: should we handle empty strings differently? Currently
streams=""
would become[""]
which might not be intended. Perhaps add a check likeelif isinstance(streams, str) and streams != "*" and streams.strip():
to filter out empty/whitespace strings, wdyt?The type annotation with
Literal["*"]
is a nice touch for clarity.
269-312
: Solid implementation that follows established patterns!The core logic looks great - proper source instantiation, config resolution, and the required
on_error="ignore"
parameter. The error handling is comprehensive and consistent with other functions in the file.One question: the
list(dataset)
conversion on line 304 assumes the dataset is iterable. Based on the PR description, this convertsInMemoryDataset
objects to lists - should we add a comment explaining this conversion for future maintainers, wdyt?The descriptive error messages for failed streams (lines 300-302) are a nice touch per the PR requirements.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/mcp/_local_ops.py
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
airbyte/mcp/_local_ops.py (2)
8-8
: LGTM on the import addition!The
Literal
import is necessary for the type annotation on line 281. Good practice to keep imports organized.
499-499
: Perfect function registration!Great job adding the function to the tools list in alphabetical order - maintains consistency with the existing pattern.
…reviews - Address feedback from aaronsteers to follow existing naming convention - Update parameter name, function call, and error message consistently - Maintains backward compatibility as this is a new function Co-Authored-By: AJ Steers <[email protected]>
- Address feedback from aaronsteers to increase default sample size - Allows LLMs to work with more data for better analysis and summaries - Users can still specify smaller limits if needed Co-Authored-By: AJ Steers <[email protected]>
- Address feedback from aaronsteers to reduce try block scope - Only wrap get_samples() call in exception handling - Let source creation and config resolution raise exceptions naturally - Maintains same error handling for the critical get_samples operation Co-Authored-By: AJ Steers <[email protected]>
feat: Add get_stream_previews MCP tool
Summary
Adds a new
get_stream_previews
MCP tool to PyAirbyte that wraps the recently addedget_samples()
functionality from the Source class. This tool provides stream preview capabilities through the MCP interface, allowing users to retrieve sample records from source connectors.Key features:
on_error="ignore"
for clean MCP responses as requestedlimit
andstreams
parameters fromget_samples()
InMemoryDataset
objects to serializable lists of dictionariesReview & Testing Checklist for Human
streams
(including "*", lists, and single strings) andlimit
parameters work correctlyRecommended test plan: Test with a simple connector like
source-faker
, try different streams parameter values ("*", specific stream names, lists), and verify both success and error scenarios.Diagram
Notes
streams
parameter handlingon_error="ignore"
and return descriptive error messages instead of None valuesSession details: Requested by AJ Steers (@aaronsteers) - Devin session
Summary by CodeRabbit
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.