-
Notifications
You must be signed in to change notification settings - Fork 67
fix: misc MCP fixes (from Davin feedback) #791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@aj/fix/misc-msp-fixes' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@aj/fix/misc-msp-fixes' Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
📝 WalkthroughWalkthroughCloud ops and local MCP utilities now avoid auto-installing missing connectors by adding an install_if_missing flag and normalize stream inputs via a new resolve_list_of_strings utility; list_connectors.install_types typing is broadened to accept single or list forms of the install-type literals. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as Caller
participant CloudOps as deploy_source_to_cloud
participant Registry as get_source
participant Cloud as CloudWorkspace
Caller->>CloudOps: deploy_source_to_cloud(source_connector_name,...)
CloudOps->>Registry: get_source(name, install_if_missing=false)
alt Source exists
Registry-->>CloudOps: Source metadata
CloudOps->>Cloud: Deploy source with metadata
Cloud-->>CloudOps: Deployment result
CloudOps-->>Caller: Success/Result
else Source missing
Registry-->>CloudOps: Not found / error
CloudOps-->>Caller: Error/handle missing source
end
sequenceDiagram
autonumber
participant Caller as Caller
participant CloudOps as create_connection_on_cloud
participant Util as resolve_list_of_strings
participant Cloud as CloudWorkspace
Caller->>CloudOps: create_connection_on_cloud(..., selected_streams)
CloudOps->>Util: resolve_list_of_strings(selected_streams)
Util-->>CloudOps: resolved_streams_list
CloudOps->>Cloud: create connection (selected_streams=resolved_streams_list)
Cloud-->>CloudOps: Result
CloudOps-->>Caller: Success/Result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Would you like a short checklist of runtime checks (imports, unit tests for resolve_list_of_strings, and cloud deploy paths) to help reviewers before merging? wdyt? Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
airbyte/mcp/_cloud_ops.py (1)
69-74
: Ensure config_spec is accessible without installation or fall back to install, wdyt?With install_if_missing=False, some connectors may not expose config_spec until installed. If that’s possible, consider a lazy fallback to avoid failing deployments.
- config_dict = resolve_config( - config=config, - config_secret_name=config_secret_name, - config_spec_jsonschema=source.config_spec, - ) + # config_spec may require installation for some connectors; try, then fall back to install. + try: + config_spec = source.config_spec + except Exception: + source.install() + config_spec = source.config_spec + config_dict = resolve_config( + config=config, + config_secret_name=config_secret_name, + config_spec_jsonschema=config_spec, + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/mcp/_cloud_ops.py
(1 hunks)airbyte/mcp/_connector_registry.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/_cloud_ops.py (2)
airbyte/cloud/workspaces.py (1)
get_source
(112-124)airbyte/_util/api_util.py (1)
get_source
(468-493)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/mcp/_cloud_ops.py (1)
65-68
: Confirm other get_source call sites updated (avoid surprise installs)Scanned the repo — multiple callers still omit install_if_missing=False. High-priority places to review/update:
- airbyte/init.py:19 (top-level get_source)
- airbyte/secrets/init.py:26 (top-level ab.get_source)
- airbyte/mcp/_connector_registry.py:108
- airbyte/cli.py:237,257,275
- airbyte/validate.py:65,97
- examples/* and tests/* (many usages; tests/examples may intentionally rely on install behavior)
Add install_if_missing=False where installations are not desired, or confirm the callers intentionally allow installs — wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
airbyte/mcp/_util.py (2)
84-96
: Docstring contradicts behavior; align semantics for None/empty and update the list countThe doc says “three types” but lists five, and it claims “None or empty input will return None” while the code returns [] for empty strings. Could we update the doc to match behavior, e.g., None->None, ""->[], and fix the count, wdyt?
Apply this diff to clarify:
def resolve_list_of_strings(value: str | list[str] | set[str] | None) -> list[str] | None: - """Resolve a string or list of strings to a list of strings. - - This method will handle three types of input: - - 1. A list of strings (e.g., ["stream1", "stream2"]) will be returned as-is. - 2. None or empty input will return None. - 3. A single CSV string (e.g., "stream1,stream2") will be split into a list. - 4. A JSON string (e.g., '["stream1", "stream2"]') will be parsed into a list. - 5. If the input is empty or None, an empty list will be returned. + """Resolve a string or collection into a list of strings. + + Handles: + 1. list[str] -> returned as-is. + 2. set[str] -> converted to a list (order unspecified). + 3. None -> returns None. + 4. "" (empty string) -> returns []. + 5. str: if it looks like a JSON array (e.g., '["a","b"]'), parse it; otherwise split on commas. Args: value: A string or list of strings. """
103-105
: Set conversion is non-deterministic; do we want stable ordering?Converting a set with list(value) yields arbitrary order, which can cause flaky behavior/tests. Should we sort for stability or at least document that order is unspecified, wdyt?
Apply this if you prefer stability:
- if isinstance(value, set): - return list(value) + if isinstance(value, set): + return sorted(value)airbyte/mcp/_local_ops.py (2)
87-90
: Consider skipping installation for 'yaml' (no binary to install)For manifest/YAML execution, there may be nothing to “install”. Should we skip ensure_installation when override_execution_mode == "yaml" to save time, wdyt?
439-443
: Empty string -> [] will attempt to read nothing; do we want to treat [] as '*' or 'suggested'?resolve_list_of_strings("") returns [], which then flows to read(). Should an empty list default to "*" or 'suggested' for a better UX, or stay strict and error out early, wdyt?
airbyte/mcp/_cloud_ops.py (1)
1-1
: Ruff format applied — file reformatted and checks passed.I ran ruff format and ruff check --fix on airbyte/mcp/_cloud_ops.py — 1 file reformatted and all checks passed. Ruff emitted a deprecation warning: move top-level 'select' in .ruff.toml to 'lint.select' to silence it — shall I update .ruff.toml accordingly, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte/mcp/_cloud_ops.py
(5 hunks)airbyte/mcp/_local_ops.py
(7 hunks)airbyte/mcp/_util.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte/mcp/_local_ops.py (3)
airbyte/mcp/_util.py (4)
resolve_config
(129-226)resolve_list_of_strings
(75-75)resolve_list_of_strings
(80-80)resolve_list_of_strings
(83-126)airbyte/_executors/docker.py (1)
ensure_installation
(36-54)airbyte/secrets/google_gsm.py (1)
fetch_connector_secrets
(249-268)
airbyte/mcp/_cloud_ops.py (2)
airbyte/mcp/_util.py (3)
resolve_list_of_strings
(75-75)resolve_list_of_strings
(80-80)resolve_list_of_strings
(83-126)airbyte/cloud/workspaces.py (1)
deploy_connection
(291-339)
🪛 GitHub Actions: Run Linters
airbyte/mcp/_local_ops.py
[error] 1-1: Ruff format check would reformat this file.
airbyte/mcp/_cloud_ops.py
[error] 1-1: Ruff format check would reformat this file.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (6)
airbyte/mcp/_cloud_ops.py (2)
64-67
: Disable auto-install on Cloud ops — good hardeningPassing install_if_missing=False avoids unexpected side-effects during Cloud deploys. LGTM, wdyt?
122-125
: Same here for destinations — LGTMConsistent with sources; avoids surprise installations on Cloud. 👍 wdyt?
airbyte/mcp/_local_ops.py (4)
49-51
: Nice opt-in install controlAdding install_if_missing gives callers control and avoids surprises. Looks good. Do you plan to bubble this up to public tools where auto-install might be undesirable (e.g., validation-only flows), wdyt?
164-167
: List comprehension refactor — LGTMThe GSM secret listing is concise and clear. 👍
371-373
: 'streams' normalization to Literal['*'] — LGTMConverting [""] to "" before calling get_samples aligns with expected API usage. Nice touch, wdyt?
1-1
: Ruff format applied; file reformatted and checks passed — update .ruff.toml 'select' → 'lint.select'?Ran
ruff format
andruff check --fix
on airbyte/mcp/_local_ops.py: 1 file reformatted; all checks passed. Ruff warned that the top-levelselect
in.ruff.toml
is deprecated and should be moved under the[lint]
section aslint.select
. Shall I update.ruff.toml
accordingly, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte/mcp/_connector_registry.py (1)
61-66
: Fix the N×T re-computation; precompute per-type sets.Filtering repeatedly calls get_available_connectors inside the list comprehension, which is O(N×T) list builds. Can we precompute once and check membership against sets, wdyt?
- if install_types_list: - # If install_types is provided, filter connectors based on the specified install types. - connectors = [ - connector - for connector in connectors - if any( - connector in get_available_connectors(install_type=install_type) - for install_type in install_types_list - ) - ] + if install_types_list: + selected_types = set(install_types_list) + available_by_type = { + t: set(get_available_connectors(install_type=t)) for t in selected_types + } + # Filter connectors based on the specified install types. + connectors = [ + connector + for connector in connectors + if any(connector in available_by_type[t] for t in selected_types) + ]Also applies to: 70-74
🧹 Nitpick comments (6)
airbyte/mcp/_connector_registry.py (2)
31-34
: Align param schema with normalization or document accepted string forms.You accept CSV/JSON strings via resolve_list_of_strings, but the type hints only allow Literal or list[Literal]. This is either too strict (tool schema will reject "python,yaml") or the normalization is unnecessary. Do you want to: (a) widen the annotation to str | list[str] | None and validate allowed values, or (b) keep current types and update docs to state only single literal or list is accepted, wdyt?
If you prefer (b), at least document CSV/JSON acceptance explicitly:
Field( description=""" Filter connectors by install type. These are not mutually exclusive: @@ - "docker": Connectors that can be installed via Docker. Note that all connectors can be run in Docker, so this filter should generally return the same results as not specifying a filter. If no install types are specified, all connectors will be returned. + You may pass a single value (e.g., "python") or a list (e.g., ["python","yaml"]). + Additionally, CSV ("python,yaml") or a JSON array string ('["python","yaml"]') + are accepted and will be normalized internally. """ ),Also applies to: 35-51, 61-63
116-118
: Remove redundant “or False”.is_docker_installed() already returns bool; the “or False” is unnecessary. Trim for clarity, wdyt?
- docker_image=is_docker_installed() or False, + docker_image=is_docker_installed(),airbyte/mcp/_local_ops.py (4)
51-52
: Good guard: opt-in install with ensure_installation.Gating installation behind install_if_missing is a solid default for local. Do we also want to expose this flag on the public MCP tools (e.g., list_source_streams, get_stream_previews, etc.) for parity and caller control, wdyt?
Also applies to: 89-91
374-376
: Treat empty string input as “no explicit streams”.resolve_list_of_strings("") yields [], which currently passes through to get_samples and may be interpreted as “zero streams.” Should we coerce [] to None (meaning “selected streams”), wdyt?
- streams_param: list[str] | Literal["*"] | None = resolve_list_of_strings(streams) - if streams_param and len(streams_param) == 1 and streams_param[0] == "*": + streams_param: list[str] | Literal["*"] | None = resolve_list_of_strings(streams) + if streams_param == []: + streams_param = None + elif streams_param and len(streams_param) == 1 and streams_param[0] == "*": streams_param = "*"
442-446
: Handle empty list in sync to avoid “read nothing”.If users pass an empty string, streams becomes []. That likely results in syncing nothing. Should we treat [] as “all” (or raise), to avoid surprises, wdyt?
- streams = resolve_list_of_strings(streams) - if streams and len(streams) == 1 and streams[0] in {"*", "suggested"}: + streams = resolve_list_of_strings(streams) + if streams == []: + streams = "*" + elif streams and len(streams) == 1 and streams[0] in {"*", "suggested"}: # Float '*' and 'suggested' to the top-level for special processing: streams = streams[0]
365-373
: Minor: wrap source/config acquisition in the same try block.get_stream_previews catches failures from get_samples, but exceptions from _get_mcp_source/resolve_config leak. Do you want one unified error path for a clearer error payload, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte/mcp/_cloud_ops.py
(5 hunks)airbyte/mcp/_connector_registry.py
(3 hunks)airbyte/mcp/_local_ops.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte/mcp/_cloud_ops.py
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte/mcp/_connector_registry.py (2)
airbyte/mcp/_util.py (3)
resolve_list_of_strings
(75-75)resolve_list_of_strings
(80-80)resolve_list_of_strings
(83-126)airbyte/sources/registry.py (1)
get_available_connectors
(225-281)
airbyte/mcp/_local_ops.py (2)
airbyte/mcp/_util.py (3)
resolve_list_of_strings
(75-75)resolve_list_of_strings
(80-80)resolve_list_of_strings
(83-126)airbyte/_executors/docker.py (1)
ensure_installation
(36-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/mcp/_connector_registry.py (1)
61-63
: Nice: normalization removes the single-string iteration pitfall.resolve_list_of_strings avoids the prior per-character iteration on bare str inputs. LGTM.
Summary by CodeRabbit
New Features
Refactor
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.