Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Sep 24, 2025

feat(mcp): Add manifest_path parameter to local operations tools

Summary

This PR adds a new optional manifest_path parameter to all PyAirbyte MCP local operations tools, enabling users to specify custom YAML manifest files for declarative connectors instead of using registry versions. This creates a bridge between PyAirbyte MCP and connector-builder-mcp functionality, allowing developers to test locally-developed YAML connector manifests using PyAirbyte's execution and testing capabilities.

Key Changes:

  • Added manifest_path parameter to 6 MCP tools: validate_connector_config, list_source_streams, get_source_stream_json_schema, read_source_stream_records, get_stream_previews, and sync_source_to_cache
  • Updated the internal _get_mcp_source function to handle manifest paths across all execution modes (auto, docker, python, yaml)
  • Added comprehensive documentation for the new parameter
  • Maintained full backward compatibility - all existing functionality works unchanged

Review & Testing Checklist for Human

🟡 Medium Risk - 3 items to verify:

  • Test with actual YAML manifest file: Create a real declarative connector manifest file and verify it works with each MCP tool, especially testing the yaml execution mode
  • Verify execution mode interactions: Confirm manifest_path works correctly with different execution modes (auto, docker, python, yaml) and that appropriate errors are shown for conflicting settings
  • Test error handling: Verify proper error messages for invalid manifest paths, malformed YAML files, and missing files

Notes

  • The implementation passes all existing linting and type checks
  • Basic functionality tested to confirm parameter acceptance and backward compatibility
  • The manifest_path parameter follows existing patterns in the codebase for optional file path parameters
  • When manifest_path is provided with incompatible execution modes (like docker), the underlying PyAirbyte source creation will properly error with a clear message

Requested by: @aaronsteers
Devin session: https://app.devin.ai/sessions/a27357ff3dbd42ce887412faf6b108fe

Summary by CodeRabbit

  • New Features

    • Added an optional manifest_path across source operations (validation, stream listing, schema retrieval, record reads, previews, cache sync); when provided it forces YAML mode and is propagated through source retrieval.
  • Bug Fixes

    • Improved handling of single-line local manifest values so local YAML manifest files are consistently recognized and treated as file paths.

Important

Auto-merge enabled.

This PR is set to merge automatically when all requirements are met.

- Add manifest_path parameter to all local MCP operations tools
- Enable testing of custom YAML connector manifests
- Bridge PyAirbyte MCP with connector-builder-mcp functionality
- Maintain backward compatibility with existing tools
- Support declarative connectors with local manifest files

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor

Original prompt from AJ Steers
Received message in Slack channel #dev-pyairbyte:

@Devin - let's add support for custom YAML connectors into PyAirbyte MCP tools. For the tools that run and test connectors (excluding cloud tools for now), let's add an option for a declarative manifest file as an input arg. This will be a bridge between connector builder MCP and PyAirbyte MCP.
Thread URL: https://airbytehq-team.slack.com/archives/C065V6XFWNQ/p1758731207903299?thread_ts=1758731207.903299

Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1758731973-add-manifest-path-to-mcp-tools' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1758731973-add-manifest-path-to-mcp-tools'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

Copy link

github-actions bot commented Sep 24, 2025

PyTest Results (Fast Tests Only, No Creds)

301 tests  ±0   301 ✅ ±0   4m 37s ⏱️ +4s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 9b03efe. ± Comparison against base commit 1b341de.

♻️ This comment has been updated with latest results.

Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a manifest_path parameter across MCP operations in airbyte/mcp/_local_ops.py, propagating it into _get_mcp_source and onward to get_source as source_manifest. Local manifest string inputs are normalized to Path in airbyte/_executors/util.py, forcing YAML mode when manifest_path is provided.

Changes

Cohort / File(s) Summary of Changes
MCP source retrieval core
airbyte/mcp/_local_ops.py
_get_mcp_source signature now accepts manifest_path: str | Path | None and forwards it to get_source as source_manifest, making YAML mode explicit when provided.
Public MCP operations
airbyte/mcp/_local_ops.py
Public functions (validate_connector_config, list_source_streams, get_source_stream_json_schema, read_source_stream_records, get_stream_previews, sync_source_to_cache) now accept manifest_path: Annotated[str | Path | None, Field(... default=None)] and propagate it to _get_mcp_source.
Parameter metadata updates
airbyte/mcp/_local_ops.py
Updated Field descriptions to state that manifest_path overrides override_execution_mode and forces YAML mode.
Executor manifest normalization
airbyte/_executors/util.py
get_connector_executor normalizes single-line, non-URL source_manifest strings into Path objects so local manifest files are consistently handled.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Ops as local_ops.<op>()
  participant Resolver as _get_mcp_source
  participant ExecUtil as get_connector_executor / get_source
  participant Src as Source

  User->>Ops: call op(..., manifest_path)
  Ops->>Resolver: _get_mcp_source(..., manifest_path)
  Resolver->>ExecUtil: get_source(..., source_manifest=manifest_path or True)
  ExecUtil->>ExecUtil: normalize source_manifest -> Path if local string
  ExecUtil-->>Resolver: returns Source instance
  Resolver-->>Ops: Source
  Ops->>Src: perform action (list/schema/read/preview/sync)
  Src-->>Ops: result
  Ops-->>User: response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • bnchrch
  • maxi297

Would you like a short checklist of unit/integration tests to verify manifest_path propagation, Path normalization, and legacy YAML fallback, wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly states the primary change of adding the manifest_path parameter to the MCP local operations tools, matching the modifications in function signatures and wiring through _get_mcp_source. It is concise and specific, focusing on the main feature without unnecessary detail.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1758731973-add-manifest-path-to-mcp-tools

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd310e7 and 9b03efe.

📒 Files selected for processing (1)
  • airbyte/mcp/_local_ops.py (12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/_local_ops.py (2)
airbyte/sources/base.py (1)
  • Source (67-969)
airbyte/_util/meta.py (1)
  • is_docker_installed (193-194)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
airbyte/mcp/_local_ops.py (2)

58-63: Restore defaulted manifest_path to avoid breaking callers

Without a default here, any existing _get_mcp_source(...) call that doesn’t pass manifest_path will now raise a TypeError, so this isn’t backward compatible. Could we keep the new parameter optional by defaulting it to None, wdyt?

-    manifest_path: str | Path | None,
+    manifest_path: str | Path | None = None,

142-148: Keep tool-level manifest_path optional

For these MCP tools, adding manifest_path without a Python-level default makes it a required keyword argument; any current caller that omits it will now fail fast. Could we set the default to None (and apply the same tweak to the other tool signatures that just gained this parameter), wdyt?

-    manifest_path: Annotated[
-        str | Path | None,
-        Field(
-            description="Path to a local YAML manifest file for declarative connectors.",
-            default=None,
-        ),
-    ],
+    manifest_path: Annotated[
+        str | Path | None,
+        Field(
+            description="Path to a local YAML manifest file for declarative connectors.",
+            default=None,
+        ),
+    ] = None,

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
airbyte/mcp/_local_ops.py (6)

46-49: Clarify manifest_path usage and auto-mode behavior in help text

Could we add a short note that when a manifest_path is provided, YAML mode is generally preferred, and that docker mode may not honor a local manifest unless explicitly supported, to set user expectations, wdyt?

 For declarative connectors, you can provide a `manifest_path` to
 specify a local YAML manifest file instead of using the registry
 version. This is useful for testing custom or locally-developed
 connector manifests.
+Note: When providing a `manifest_path`, YAML execution is typically preferred.
+If you leave `override_execution_mode="auto"`, consider that docker may be selected
+if available, which may not honor local manifests. Explicitly set `override_execution_mode="yaml"`
+to force local manifest usage.

89-91: Guard against empty-string manifest paths in YAML mode

Using manifest_path or True treats empty string as True, silently ignoring the provided value. If you keep this pattern, should we validate non-empty strings before this point (as in the refactor above) to avoid surprising behavior, wdyt?


241-245: list_source_streams: manifest_path wiring LGTM

Looks correct. Do we want to log/echo which execution mode was chosen when manifest_path is set and mode is auto, to aid debuggability, wdyt?

Also applies to: 253-254


463-471: get_stream_previews: manifest_path wiring LGTM (naming nit)

Wiring looks good. Nit: for API consistency, would you consider aligning source_name with source_connector_name used elsewhere in this module in a future pass, wdyt?

Also applies to: 481-482


559-566: sync_source_to_cache: manifest_path wiring LGTM

Looks correct. Nice fallback to "*" when registry metadata isn't available (common for local YAML). Do we want to mention this in docs to set expectations for declarative-only sources, wdyt?

Also applies to: 572-573


58-91: Prefer YAML for local manifests, validate early, and DRY get_source kwargs

  • When manifest_path is set and override_execution_mode=="auto", switch to "yaml" before checking Docker to improve UX and avoid confusing errors?
  • Validate manifest_path upfront (exists, is file, .yml/.yaml) for clearer errors?
  • Consolidate get_source keyword arguments into a single dict and call once to eliminate duplication? wdyt?
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b341de and b2b7102.

📒 Files selected for processing (1)
  • airbyte/mcp/_local_ops.py (12 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte/mcp/_local_ops.py (3)

139-146: validate_connector_config: manifest_path plumbed correctly

Signature and propagation look good. Could we add a quick test covering (auto,yaml,python,docker) with and without manifest_path to validate error surfaces and success cases, wdyt?

Also applies to: 155-156


302-309: get_source_stream_json_schema: manifest_path wiring LGTM

All good. Any interest in a small smoke test with a local declarative YAML to ensure schema retrieval works in yaml mode, wdyt?

Also applies to: 315-316


372-379: read_source_stream_records: manifest_path wiring LGTM

Looks good. Since yaml + manifest_path is a primary scenario, could we add a happy-path test that reads a handful of records from a tiny local manifest to catch regressions, wdyt?

Also applies to: 386-387

Copy link

github-actions bot commented Sep 24, 2025

PyTest Results (Full)

364 tests  ±0   348 ✅ ±0   22m 44s ⏱️ -6s
  1 suites ±0    16 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 9b03efe. ± Comparison against base commit 1b341de.

♻️ This comment has been updated with latest results.

devin-ai-integration bot and others added 2 commits September 24, 2025 17:30
…hon defaults

- Force execution mode to 'yaml' when manifest_path is provided
- Remove Python-level default arguments from function signatures
- Keep Field() defaults intact for MCP tool compatibility
- Update parameter documentation to clarify override behavior

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aef665a and 2bfbfd7.

📒 Files selected for processing (1)
  • airbyte/_executors/util.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (No Creds)

@aaronsteers aaronsteers requested a review from Copilot September 24, 2025 20:30
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an optional manifest_path parameter to PyAirbyte MCP local operations tools, enabling users to specify custom YAML manifest files for declarative connectors instead of using registry versions. This creates a bridge between PyAirbyte MCP and connector-builder-mcp functionality for testing locally-developed connector manifests.

  • Added manifest_path parameter to 6 MCP tools for connector operations
  • Updated internal _get_mcp_source function to handle manifest paths across execution modes
  • Enhanced path handling logic to properly detect file paths vs inline YAML content

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
airbyte/mcp/_local_ops.py Added manifest_path parameter to all MCP tools and updated _get_mcp_source function
airbyte/_executors/util.py Enhanced path detection logic for plain string manifest paths

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

aaronsteers and others added 2 commits September 24, 2025 13:32
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
aaronsteers

This comment was marked as outdated.

@aaronsteers
Copy link
Contributor Author

aaronsteers commented Sep 24, 2025

/fix-pr

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.
(This job requires that the PR author has "Allow edits from maintainers" enabled.)

PR auto-fix job started... Check job output.

🟦 Job completed successfully (no changes).

@aaronsteers
Copy link
Contributor Author

aaronsteers commented Sep 24, 2025

/fix-pr

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.
(This job requires that the PR author has "Allow edits from maintainers" enabled.)

PR auto-fix job started... Check job output.

🟦 Job completed successfully (no changes).

@aaronsteers aaronsteers enabled auto-merge (squash) September 24, 2025 21:48
@aaronsteers aaronsteers merged commit 20bb0a8 into main Sep 24, 2025
24 checks passed
@aaronsteers aaronsteers deleted the devin/1758731973-add-manifest-path-to-mcp-tools branch September 24, 2025 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant