-
Notifications
You must be signed in to change notification settings - Fork 67
feat: add new mcp tools for working with cached data #734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@aj/feat/add-cache-ops-to-mcp' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@aj/feat/add-cache-ops-to-mcp' Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds new MCP (Model Context Protocol) tools for working with cached data in the Airbyte system. The changes introduce functionality to inspect and describe cached datasets stored in the default DuckDB cache.
- Adds two new functions to query cached data:
list_cached_datasets()
anddescribe_default_cache()
- Introduces a
CachedDatasetInfo
data model to structure cached dataset information - Registers the new tools with the FastMCP application and reorders the tool registration list
📝 WalkthroughWalkthroughThis change introduces new cache inspection and SQL query execution tools to the local operations module, including safe SQL validation and detailed error reporting. It augments the cache base class with a method for executing SQL queries and updates cache management practices to ensure proper closure. Minor test cleanup is also performed. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant FastMCP App
participant LocalOps
participant CacheBase
User->>FastMCP App: Call run_sql_query(sql_query, max_records)
FastMCP App->>LocalOps: run_sql_query(sql_query, max_records)
LocalOps->>CacheBase: run_sql_query(sql_query, max_records)
CacheBase-->>LocalOps: Query results or error
LocalOps-->>FastMCP App: Results or error details
FastMCP App-->>User: Results or error details
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Would you like me to elaborate with some examples of the SQL validation rules or error handling scenarios, or does this overview provide enough clarity for your review, wdyt? 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte/mcp/_local_ops.py (2)
281-290
: Clean Pydantic model design!Love the simple, focused approach with room for future expansion. The TODO comment needs an issue link to satisfy the linter though - could you add one? Something like
# TODO(#123): add later:
wdyt?
301-309
: Great cache metadata function!Really useful for debugging and inspection. The implementation is solid. One tiny thought - would it be worth making the return type a bit more specific with a TypedDict, or is the flexibility of
dict[str, Any]
preferred here? Either way works, just curious about your preference!
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/mcp/_local_ops.py
(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Run Linters
airbyte/mcp/_local_ops.py
[warning] 15-15: Ruff TC001: Move application import airbyte.caches.duckdb.DuckDBCache
into a type-checking block.
[warning] 286-286: Ruff TD003: Missing issue link on the line following this TODO.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (4)
airbyte/mcp/_local_ops.py (4)
12-12
: LGTM on the BaseModel import!This import is necessary for the new
CachedDatasetInfo
model. Nice and clean!
15-15
: Import looks correct, but let's double-check the linter warning?The pipeline shows a warning about moving this import to a TYPE_CHECKING block, but you're actually using
DuckDBCache
at runtime inlist_cached_datasets()
. The linter might be confused here - wdyt? Should we keep it as is since it's used at runtime?
292-298
: Excellent implementation of list_cached_datasets!Clean list comprehension and proper use of the cache interface. The function design aligns perfectly with the PR objectives for cached data tools. Well done!
316-322
: Perfect tool registration updates!Both new cache tools are properly registered and the reordering looks intentional. The registration follows the existing pattern nicely - everything looks great here!
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
airbyte/caches/base.py (1)
156-202
: Solid SQL query execution with good error handling!I really like the implementation here - proper connection context management, specific SQLAlchemy exception handling, and the distinction between row-returning vs non-row-returning queries. The use of
strict=True
in the zip is a nice safety touch too.One small question: would it be worth adding a docstring note about which types of SQL statements this is intended for vs when to use the processor directly, wdyt? The current docstring mentions "single DML statements" but some users might not be familiar with that terminology.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte/caches/base.py
(3 hunks)airbyte/mcp/_local_ops.py
(2 hunks)airbyte/shared/sql_processor.py
(1 hunks)tests/integration_tests/test_duckdb_cache.py
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- airbyte/shared/sql_processor.py
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte/mcp/_local_ops.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: in `examples/run_perf_test_reads.py`, the code for setting up snowflake configuration in `get_cache`...
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#415
File: examples/run_perf_test_reads.py:117-127
Timestamp: 2024-10-09T19:21:45.994Z
Learning: In `examples/run_perf_test_reads.py`, the code for setting up Snowflake configuration in `get_cache` and `get_destination` cannot be refactored into a shared helper function because there are differences between them.
Applied to files:
tests/integration_tests/test_duckdb_cache.py
📚 Learning: test fixtures in the pyairbyte project do not need to align with real docker repositories....
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#347
File: tests/integration_tests/fixtures/registry.json:48-48
Timestamp: 2024-08-31T01:20:08.405Z
Learning: Test fixtures in the PyAirbyte project do not need to align with real Docker repositories.
Applied to files:
tests/integration_tests/test_duckdb_cache.py
📚 Learning: the `bigquerycache.get_arrow_dataset` method should have a docstring that correctly states the reaso...
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#281
File: airbyte/caches/bigquery.py:40-43
Timestamp: 2024-10-08T15:34:31.026Z
Learning: The `BigQueryCache.get_arrow_dataset` method should have a docstring that correctly states the reason for the `NotImplementedError` as BigQuery not supporting `to_arrow`, instead of incorrectly mentioning `pd.read_sql_table`.
Applied to files:
airbyte/caches/base.py
🧬 Code Graph Analysis (1)
tests/integration_tests/test_duckdb_cache.py (5)
airbyte/_util/venv_util.py (1)
get_bin_dir
(15-20)airbyte/caches/util.py (1)
new_local_cache
(39-77)tests/integration_tests/test_source_faker_integration.py (1)
duckdb_cache
(87-100)airbyte/caches/base.py (2)
close
(148-154)processor
(144-146)airbyte/shared/sql_processor.py (1)
close
(778-789)
🪛 GitHub Actions: Run Linters
airbyte/caches/base.py
[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (3)
tests/integration_tests/test_duckdb_cache.py (2)
17-18
: LGTM on the import reordering!Nice organizational improvement moving the
airbyte
import belowpytest
.
86-95
: Excellent test coverage for the new close functionality!I like how this test verifies both the cache-level
close()
method and the underlying processor'sclose()
method, plus checks that multiple calls are idempotent. This gives good confidence that the resource cleanup works as expected. The test is simple but thorough - well done!airbyte/caches/base.py (1)
148-154
: Clean and simple delegation pattern!The
close()
method provides a nice convenience interface at the cache level while properly delegating to the underlying processor. This aligns well with the test coverage I saw in the other file.
/fix-pr
|
/fix-pr
|
Summary by CodeRabbit
New Features
Bug Fixes
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.