Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Sep 29, 2025

Important

Auto-merge enabled.

This PR is set to merge automatically when all requirements are met.

Summary by CodeRabbit

  • New Features

    • Expanded MCP public API; additional submodules are now available for direct use.
  • Documentation

    • Enhanced MCP package docs with an experimental notice and a clearer getting-started guide (dotenv secrets, server registration, connection testing).
    • Updated examples to use clearer configuration paths.
    • Added a comprehensive “Contributing to the PyAirbyte MCP Server” section with setup, testing from source, config examples, and common tasks.
  • Style

    • Improved inline documentation and annotations across MCP components.

@aaronsteers aaronsteers requested a review from Copilot September 29, 2025 22:15
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@docs/mcp-getting-started' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@docs/mcp-getting-started'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the documentation for the PyAirbyte MCP (Model Context Protocol) server, including auto-generated pdocs support. The changes reorganize and enhance the documentation structure to provide clearer setup instructions and better developer guidance.

Key changes:

  • Restructured MCP documentation with improved setup instructions and user guide
  • Added comprehensive contributing guide section for MCP server development
  • Reorganized module imports and added pdoc visibility annotations

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
docs/CONTRIBUTING.md Added new section for MCP server development with testing instructions and configuration examples
airbyte/mcp/server.py Updated imports and added docstring annotations for better documentation generation
airbyte/mcp/local_ops.py Added pdoc visibility annotations and improved function documentation
airbyte/mcp/connector_registry.py Added pdoc visibility annotations for internal functions
airbyte/mcp/cloud_ops.py Added pdoc visibility annotations for internal registration function
airbyte/mcp/init.py Major restructuring of module documentation with detailed setup guide and improved organization

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@aaronsteers aaronsteers enabled auto-merge (squash) September 29, 2025 22:15
Copy link
Contributor

coderabbitai bot commented Sep 29, 2025

Warning

Rate limit exceeded

@aaronsteers has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between f4171bf and 1031cb0.

📒 Files selected for processing (2)
  • airbyte/mcp/__init__.py (3 hunks)
  • docs/CONTRIBUTING.md (2 hunks)

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

The MCP package’s public API was expanded by exporting additional submodules and updating imports. Several docstrings were revised to mark internals, and CONTRIBUTING docs gained an MCP server contribution/testing section. local_ops introduced new internal (deferred) tools and renamed a public constant to a private one.

Changes

Cohort / File(s) Summary
Public API surface and imports
airbyte/mcp/__init__.py, airbyte/mcp/server.py
Expanded exports to include cloud_ops, connector_registry, and local_ops; updated __all__. Server imports switched from private module paths to public submodules. Added/expanded docstrings.
Local ops enhancements
airbyte/mcp/local_ops.py
Renamed CONFIG_HELP to _CONFIG_HELP and updated usages. Added internal/deferred tools: list_cached_streams, describe_default_cache, run_sql_query. Marked registration as internal in docs. Would you like to add a deprecation shim or alias for CONFIG_HELP to ease transition, wdyt?
Cloud ops docs tweak
airbyte/mcp/cloud_ops.py
Marked register_cloud_ops_tools as private/internal via docstring; no logic changes.
Connector registry docs tweak
airbyte/mcp/connector_registry.py
Marked ConnectorInfo and registration function as private/internal in docs; inserted deferred marker for a tool. No signature changes.
Documentation
docs/CONTRIBUTING.md
Added section for contributing to/testing the PyAirbyte MCP Server, including config examples and Poe tasks. Would you like to reference the new public submodules here for clarity, wdyt?

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and succinctly conveys that the PR focuses on enhancing MCP documentation and includes setup for auto-generated pdocs, which matches the docstring and CONTRIBUTING.md updates in the changeset. It adheres to conventional commit style and avoids extraneous details.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/CONTRIBUTING.md (1)

68-133: Well-structured documentation for MCP contributors!

The new section provides clear, actionable guidance for testing and contributing to the MCP server. The examples are comprehensive and cover both local and cloud operations. The JSON config example is syntactically correct and the poetry-based workflow is well-explained.

A few thoughts to consider:

  1. The JSON config example uses absolute paths (/path/to/repos/PyAirbyte, /path/to/.mcp/airbyte_mcp.env) – might be worth adding a note that users should replace these with their actual paths?
  2. The hot-reload note (line 105) is valuable – it might be even more prominent if it appeared earlier in the section, perhaps right after the config example?

wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20bb0a8 and f4171bf.

📒 Files selected for processing (6)
  • airbyte/mcp/__init__.py (4 hunks)
  • airbyte/mcp/cloud_ops.py (1 hunks)
  • airbyte/mcp/connector_registry.py (3 hunks)
  • airbyte/mcp/local_ops.py (6 hunks)
  • airbyte/mcp/server.py (1 hunks)
  • docs/CONTRIBUTING.md (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/server.py (5)
airbyte/mcp/cloud_ops.py (1)
  • register_cloud_ops_tools (504-519)
airbyte/mcp/connector_registry.py (1)
  • register_connector_registry_tools (154-160)
airbyte/mcp/local_ops.py (1)
  • register_local_ops_tools (763-785)
airbyte/_util/meta.py (1)
  • set_mcp_mode (40-51)
airbyte/mcp/_util.py (1)
  • initialize_secrets (37-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (16)
airbyte/mcp/connector_registry.py (3)

104-104: LGTM! Appropriate @Private marking for internal model.

The ConnectorInfo class is correctly marked as private since it's an internal data structure used by the MCP tools. This will help keep the public API documentation focused on user-facing components.


113-113: Good documentation of deferred registration pattern.

The comment clearly indicates that tool registration happens later via the register_connector_registry_tools function, which helps maintainers understand why the decorator isn't applied directly. This is consistent with the pattern used throughout the MCP modules.


154-158: Consistent @Private marking for internal registration function.

The docstring update correctly marks this as an internal function. This is consistent with the pattern used in cloud_ops.py and local_ops.py, helping maintain a clean public API surface by hiding implementation details of the tool registration lifecycle.

airbyte/mcp/cloud_ops.py (1)

504-508: Consistent @Private marking for internal registration function.

The docstring update maintains consistency with the registration functions in connector_registry.py and local_ops.py. This helps ensure that implementation details of the tool registration lifecycle remain internal while keeping the public API clean and focused on user-facing functionality.

airbyte/mcp/server.py (3)

11-13: LGTM! Import paths updated to reflect new public API structure.

The import paths have been updated from the underscore-prefixed internal modules to the new public submodule structure. This aligns with the expanded MCP package API mentioned in the PR objectives and maintains backward compatibility since these imports are used internally for server initialization.


19-21: Good addition of app object documentation.

Adding a docstring for the module-level app variable improves code documentation and helps users understand the purpose of this FastMCP instance. Clear and concise!


27-34: Enhanced main() documentation with appropriate @Private marking.

The expanded docstring provides clear guidance about the function's purpose and usage, correctly marking it as private since it's invoked via CLI entry point rather than direct user calls. The note about consulting MCP client documentation is particularly helpful for users trying to understand how to connect to the server.

airbyte/mcp/local_ops.py (7)

28-28: Good practice: marking internal constant as private.

Renaming CONFIG_HELP to _CONFIG_HELP follows Python convention for private module-level variables. This correctly signals that the constant is for internal use within the module (for tool descriptions) and shouldn't be relied upon by external code.


646-659: Nice addition: cache introspection tool.

The list_cached_streams() function provides a clean way to inspect what's in the default cache. The implementation correctly:

  • Uses the CachedDatasetInfo model for structured returns
  • Properly closes the cache with del cache
  • Accounts for table prefixes and schema names

This will be helpful for users exploring their cached data. Have you tested this with various cache configurations (with/without table prefix, different schema names)? wdyt?


662-671: Useful cache metadata tool.

The describe_default_cache() function provides valuable metadata about cache configuration. The information returned (cache type, paths, and cached streams) will help users understand their cache setup, especially when working with SQL queries via run_sql_query().


674-706: Solid security validation for SQL queries.

The _is_safe_sql() function implements good security practices:

  • Disallows multi-statement queries (semicolon check)
  • Whitelist approach for read-only operations
  • Proper normalization for case-insensitive prefix matching

One thought: the semicolon check on line 693 scans the original query, which correctly catches semicolons in comments or strings. However, have you considered edge cases where a semicolon might appear in a string literal within a valid SELECT query? This might be overly restrictive, but it's also safer. wdyt about the trade-off?


709-758: Well-implemented SQL query tool with good security.

The run_sql_query() function is thoughtfully designed:

  • Security validation before execution
  • Clear error messages with context (query, traceback)
  • Proper resource cleanup with finally block
  • Good defaults (max_records=1000)
  • Helpful docstring with examples

The implementation correctly returns error information as a list with a single dict (matching the return type signature), which maintains consistency with the successful case.


763-767: Consistent @Private marking for registration function.

The docstring update maintains consistency with the registration functions in cloud_ops.py and connector_registry.py, correctly marking this internal function to keep it out of public API documentation.


768-785: Proper registration of new tools with consistent description pattern.

The registration logic correctly:

  • Includes the three new tools (describe_default_cache, list_cached_streams, run_sql_query)
  • Appends _CONFIG_HELP to tool descriptions for consistency
  • Keeps list_connector_config_secrets separate (without config help, which makes sense since it doesn't use connector config)

Nice organization!

airbyte/mcp/__init__.py (2)

5-7: Clear experimental status warning.

The NOTE block appropriately sets expectations about API stability. This is helpful for users adopting the MCP integration.


139-139: Expose all MCP submodules as intended: cloud_ops, connector_registry, and local_ops each define public tool APIs (e.g., deploy_source_to_cloud, list_connectors, validate_connector_config), so including them alongside server in __init__.py correctly surfaces the full MCP API. wdyt?

Copy link

github-actions bot commented Sep 29, 2025

PyTest Results (Fast Tests Only, No Creds)

301 tests  ±0   301 ✅ ±0   4m 24s ⏱️ -1s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 1031cb0. ± Comparison against base commit 20bb0a8.

♻️ This comment has been updated with latest results.

Copy link

PyTest Results (Full)

364 tests  ±0   348 ✅ ±0   22m 27s ⏱️ + 1m 17s
  1 suites ±0    16 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 1031cb0. ± Comparison against base commit 20bb0a8.

@aaronsteers aaronsteers merged commit c64e73d into main Sep 29, 2025
21 checks passed
@aaronsteers aaronsteers deleted the docs/mcp-getting-started branch September 29, 2025 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant