Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Sep 18, 2025

Summary by CodeRabbit

  • New Features

    • Connector listing accepts a single install type or multiple types for flexible filtering.
    • Selected streams may be provided as a single string or a list; stream inputs are normalized for consistent behavior.
  • Refactor

    • Cloud deploys no longer auto-install missing sources/destinations; deployments require connectors to be present.
    • Stream preview and cache sync flows standardized via centralized stream-input normalization.

Important

Auto-merge enabled.

This PR is set to merge automatically when all requirements are met.

Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@aj/fix/misc-msp-fixes' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@aj/fix/misc-msp-fixes'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

Copy link

github-actions bot commented Sep 18, 2025

PyTest Results (Fast Tests Only, No Creds)

301 tests  ±0   300 ✅ ±0   4m 28s ⏱️ +6s
  1 suites ±0     1 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 3be0cfe. ± Comparison against base commit 10125ff.

♻️ This comment has been updated with latest results.

Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

📝 Walkthrough

Walkthrough

Cloud ops and local MCP utilities now avoid auto-installing missing connectors by adding an install_if_missing flag and normalize stream inputs via a new resolve_list_of_strings utility; list_connectors.install_types typing is broadened to accept single or list forms of the install-type literals.

Changes

Cohort / File(s) Summary of Changes
Cloud ops: deploy & connection
airbyte/mcp/_cloud_ops.py
Calls to get_source/get_destination now pass install_if_missing=False. create_connection_on_cloud accepts `selected_streams: str
Local ops: source handling & streams
airbyte/mcp/_local_ops.py
_get_mcp_source gains keyword-only install_if_missing: bool = True and installation is conditional on it. Stream-related functions normalize inputs via resolve_list_of_strings; single-element ["*"] is preserved as "*".
Utility: normalization helper
airbyte/mcp/_util.py
Added resolve_list_of_strings(...) with overloads to normalize None, str, list[str], or set[str] into None or list[str]; supports JSON-array strings, comma-separated lists, trimming, and raises on invalid JSON types.
Connector registry typing
airbyte/mcp/_connector_registry.py
list_connectors annotation for install_types widened to accept a single literal or a list of literals (`"java"
Imports / exports updated
airbyte/mcp/_cloud_ops.py, airbyte/mcp/_local_ops.py, airbyte/mcp/_connector_registry.py
Import lines updated to include resolve_list_of_strings from airbyte.mcp._util; call sites updated to use the utility and adjusted typings/signatures.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller as Caller
  participant CloudOps as deploy_source_to_cloud
  participant Registry as get_source
  participant Cloud as CloudWorkspace

  Caller->>CloudOps: deploy_source_to_cloud(source_connector_name,...)
  CloudOps->>Registry: get_source(name, install_if_missing=false)
  alt Source exists
    Registry-->>CloudOps: Source metadata
    CloudOps->>Cloud: Deploy source with metadata
    Cloud-->>CloudOps: Deployment result
    CloudOps-->>Caller: Success/Result
  else Source missing
    Registry-->>CloudOps: Not found / error
    CloudOps-->>Caller: Error/handle missing source
  end
Loading
sequenceDiagram
  autonumber
  participant Caller as Caller
  participant CloudOps as create_connection_on_cloud
  participant Util as resolve_list_of_strings
  participant Cloud as CloudWorkspace

  Caller->>CloudOps: create_connection_on_cloud(..., selected_streams)
  CloudOps->>Util: resolve_list_of_strings(selected_streams)
  Util-->>CloudOps: resolved_streams_list
  CloudOps->>Cloud: create connection (selected_streams=resolved_streams_list)
  Cloud-->>CloudOps: Result
  CloudOps-->>Caller: Success/Result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Would you like a short checklist of runtime checks (imports, unit tests for resolve_list_of_strings, and cloud deploy paths) to help reviewers before merging? wdyt?

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The title "fix: misc MCP files (from Davin feedback)" is related to the changed MCP modules but is too vague and non-descriptive; it does not summarize the primary changes such as adding resolve_list_of_strings, normalizing selected_streams inputs, and preventing auto-install during cloud deploy, so the check cannot conclusively verify that the title communicates the main change. Could you make the title more descriptive—for example, "fix(mcp): normalize selected_streams, add resolve_list_of_strings, and prevent auto-install during cloud deploy"—wdyt?
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch aj/fix/misc-msp-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
airbyte/mcp/_cloud_ops.py (1)

69-74: Ensure config_spec is accessible without installation or fall back to install, wdyt?

With install_if_missing=False, some connectors may not expose config_spec until installed. If that’s possible, consider a lazy fallback to avoid failing deployments.

-        config_dict = resolve_config(
-            config=config,
-            config_secret_name=config_secret_name,
-            config_spec_jsonschema=source.config_spec,
-        )
+        # config_spec may require installation for some connectors; try, then fall back to install.
+        try:
+            config_spec = source.config_spec
+        except Exception:
+            source.install()
+            config_spec = source.config_spec
+        config_dict = resolve_config(
+            config=config,
+            config_secret_name=config_secret_name,
+            config_spec_jsonschema=config_spec,
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10125ff and f9e71ef.

📒 Files selected for processing (2)
  • airbyte/mcp/_cloud_ops.py (1 hunks)
  • airbyte/mcp/_connector_registry.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/_cloud_ops.py (2)
airbyte/cloud/workspaces.py (1)
  • get_source (112-124)
airbyte/_util/api_util.py (1)
  • get_source (468-493)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/mcp/_cloud_ops.py (1)

65-68: Confirm other get_source call sites updated (avoid surprise installs)

Scanned the repo — multiple callers still omit install_if_missing=False. High-priority places to review/update:

  • airbyte/init.py:19 (top-level get_source)
  • airbyte/secrets/init.py:26 (top-level ab.get_source)
  • airbyte/mcp/_connector_registry.py:108
  • airbyte/cli.py:237,257,275
  • airbyte/validate.py:65,97
  • examples/* and tests/* (many usages; tests/examples may intentionally rely on install behavior)

Add install_if_missing=False where installations are not desired, or confirm the callers intentionally allow installs — wdyt?

Copy link

github-actions bot commented Sep 18, 2025

PyTest Results (Full)

364 tests  ±0   348 ✅ ±0   22m 42s ⏱️ +9s
  1 suites ±0    16 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 3be0cfe. ± Comparison against base commit 10125ff.

♻️ This comment has been updated with latest results.

@aaronsteers aaronsteers enabled auto-merge (squash) September 18, 2025 20:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
airbyte/mcp/_util.py (2)

84-96: Docstring contradicts behavior; align semantics for None/empty and update the list count

The doc says “three types” but lists five, and it claims “None or empty input will return None” while the code returns [] for empty strings. Could we update the doc to match behavior, e.g., None->None, ""->[], and fix the count, wdyt?

Apply this diff to clarify:

 def resolve_list_of_strings(value: str | list[str] | set[str] | None) -> list[str] | None:
-    """Resolve a string or list of strings to a list of strings.
-
-    This method will handle three types of input:
-
-    1. A list of strings (e.g., ["stream1", "stream2"]) will be returned as-is.
-    2. None or empty input will return None.
-    3. A single CSV string (e.g., "stream1,stream2") will be split into a list.
-    4. A JSON string (e.g., '["stream1", "stream2"]') will be parsed into a list.
-    5. If the input is empty or None, an empty list will be returned.
+    """Resolve a string or collection into a list of strings.
+
+    Handles:
+    1. list[str] -> returned as-is.
+    2. set[str] -> converted to a list (order unspecified).
+    3. None -> returns None.
+    4. "" (empty string) -> returns [].
+    5. str: if it looks like a JSON array (e.g., '["a","b"]'), parse it; otherwise split on commas.
 
     Args:
         value: A string or list of strings.
     """

103-105: Set conversion is non-deterministic; do we want stable ordering?

Converting a set with list(value) yields arbitrary order, which can cause flaky behavior/tests. Should we sort for stability or at least document that order is unspecified, wdyt?

Apply this if you prefer stability:

-    if isinstance(value, set):
-        return list(value)
+    if isinstance(value, set):
+        return sorted(value)
airbyte/mcp/_local_ops.py (2)

87-90: Consider skipping installation for 'yaml' (no binary to install)

For manifest/YAML execution, there may be nothing to “install”. Should we skip ensure_installation when override_execution_mode == "yaml" to save time, wdyt?


439-443: Empty string -> [] will attempt to read nothing; do we want to treat [] as '*' or 'suggested'?

resolve_list_of_strings("") returns [], which then flows to read(). Should an empty list default to "*" or 'suggested' for a better UX, or stay strict and error out early, wdyt?

airbyte/mcp/_cloud_ops.py (1)

1-1: Ruff format applied — file reformatted and checks passed.

I ran ruff format and ruff check --fix on airbyte/mcp/_cloud_ops.py — 1 file reformatted and all checks passed. Ruff emitted a deprecation warning: move top-level 'select' in .ruff.toml to 'lint.select' to silence it — shall I update .ruff.toml accordingly, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9e71ef and 6299818.

📒 Files selected for processing (3)
  • airbyte/mcp/_cloud_ops.py (5 hunks)
  • airbyte/mcp/_local_ops.py (7 hunks)
  • airbyte/mcp/_util.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte/mcp/_local_ops.py (3)
airbyte/mcp/_util.py (4)
  • resolve_config (129-226)
  • resolve_list_of_strings (75-75)
  • resolve_list_of_strings (80-80)
  • resolve_list_of_strings (83-126)
airbyte/_executors/docker.py (1)
  • ensure_installation (36-54)
airbyte/secrets/google_gsm.py (1)
  • fetch_connector_secrets (249-268)
airbyte/mcp/_cloud_ops.py (2)
airbyte/mcp/_util.py (3)
  • resolve_list_of_strings (75-75)
  • resolve_list_of_strings (80-80)
  • resolve_list_of_strings (83-126)
airbyte/cloud/workspaces.py (1)
  • deploy_connection (291-339)
🪛 GitHub Actions: Run Linters
airbyte/mcp/_local_ops.py

[error] 1-1: Ruff format check would reformat this file.

airbyte/mcp/_cloud_ops.py

[error] 1-1: Ruff format check would reformat this file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (6)
airbyte/mcp/_cloud_ops.py (2)

64-67: Disable auto-install on Cloud ops — good hardening

Passing install_if_missing=False avoids unexpected side-effects during Cloud deploys. LGTM, wdyt?


122-125: Same here for destinations — LGTM

Consistent with sources; avoids surprise installations on Cloud. 👍 wdyt?

airbyte/mcp/_local_ops.py (4)

49-51: Nice opt-in install control

Adding install_if_missing gives callers control and avoids surprises. Looks good. Do you plan to bubble this up to public tools where auto-install might be undesirable (e.g., validation-only flows), wdyt?


164-167: List comprehension refactor — LGTM

The GSM secret listing is concise and clear. 👍


371-373: 'streams' normalization to Literal['*'] — LGTM

Converting [""] to "" before calling get_samples aligns with expected API usage. Nice touch, wdyt?


1-1: Ruff format applied; file reformatted and checks passed — update .ruff.toml 'select' → 'lint.select'?

Ran ruff format and ruff check --fix on airbyte/mcp/_local_ops.py: 1 file reformatted; all checks passed. Ruff warned that the top-level select in .ruff.toml is deprecated and should be moved under the [lint] section as lint.select. Shall I update .ruff.toml accordingly, wdyt?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
airbyte/mcp/_connector_registry.py (1)

61-66: Fix the N×T re-computation; precompute per-type sets.

Filtering repeatedly calls get_available_connectors inside the list comprehension, which is O(N×T) list builds. Can we precompute once and check membership against sets, wdyt?

-    if install_types_list:
-        # If install_types is provided, filter connectors based on the specified install types.
-        connectors = [
-            connector
-            for connector in connectors
-            if any(
-                connector in get_available_connectors(install_type=install_type)
-                for install_type in install_types_list
-            )
-        ]
+    if install_types_list:
+        selected_types = set(install_types_list)
+        available_by_type = {
+            t: set(get_available_connectors(install_type=t)) for t in selected_types
+        }
+        # Filter connectors based on the specified install types.
+        connectors = [
+            connector
+            for connector in connectors
+            if any(connector in available_by_type[t] for t in selected_types)
+        ]

Also applies to: 70-74

🧹 Nitpick comments (6)
airbyte/mcp/_connector_registry.py (2)

31-34: Align param schema with normalization or document accepted string forms.

You accept CSV/JSON strings via resolve_list_of_strings, but the type hints only allow Literal or list[Literal]. This is either too strict (tool schema will reject "python,yaml") or the normalization is unnecessary. Do you want to: (a) widen the annotation to str | list[str] | None and validate allowed values, or (b) keep current types and update docs to state only single literal or list is accepted, wdyt?

If you prefer (b), at least document CSV/JSON acceptance explicitly:

         Field(
             description="""
               Filter connectors by install type.
               These are not mutually exclusive:
@@
               - "docker": Connectors that can be installed via Docker. Note that all connectors
                 can be run in Docker, so this filter should generally return the same results as
                 not specifying a filter.
               If no install types are specified, all connectors will be returned.
+              You may pass a single value (e.g., "python") or a list (e.g., ["python","yaml"]).
+              Additionally, CSV ("python,yaml") or a JSON array string ('["python","yaml"]')
+              are accepted and will be normalized internally.
               """
         ),

Also applies to: 35-51, 61-63


116-118: Remove redundant “or False”.

is_docker_installed() already returns bool; the “or False” is unnecessary. Trim for clarity, wdyt?

-        docker_image=is_docker_installed() or False,
+        docker_image=is_docker_installed(),
airbyte/mcp/_local_ops.py (4)

51-52: Good guard: opt-in install with ensure_installation.

Gating installation behind install_if_missing is a solid default for local. Do we also want to expose this flag on the public MCP tools (e.g., list_source_streams, get_stream_previews, etc.) for parity and caller control, wdyt?

Also applies to: 89-91


374-376: Treat empty string input as “no explicit streams”.

resolve_list_of_strings("") yields [], which currently passes through to get_samples and may be interpreted as “zero streams.” Should we coerce [] to None (meaning “selected streams”), wdyt?

-    streams_param: list[str] | Literal["*"] | None = resolve_list_of_strings(streams)
-    if streams_param and len(streams_param) == 1 and streams_param[0] == "*":
+    streams_param: list[str] | Literal["*"] | None = resolve_list_of_strings(streams)
+    if streams_param == []:
+        streams_param = None
+    elif streams_param and len(streams_param) == 1 and streams_param[0] == "*":
         streams_param = "*"

442-446: Handle empty list in sync to avoid “read nothing”.

If users pass an empty string, streams becomes []. That likely results in syncing nothing. Should we treat [] as “all” (or raise), to avoid surprises, wdyt?

-    streams = resolve_list_of_strings(streams)
-    if streams and len(streams) == 1 and streams[0] in {"*", "suggested"}:
+    streams = resolve_list_of_strings(streams)
+    if streams == []:
+        streams = "*"
+    elif streams and len(streams) == 1 and streams[0] in {"*", "suggested"}:
         # Float '*' and 'suggested' to the top-level for special processing:
         streams = streams[0]

365-373: Minor: wrap source/config acquisition in the same try block.

get_stream_previews catches failures from get_samples, but exceptions from _get_mcp_source/resolve_config leak. Do you want one unified error path for a clearer error payload, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6299818 and 3be0cfe.

📒 Files selected for processing (3)
  • airbyte/mcp/_cloud_ops.py (5 hunks)
  • airbyte/mcp/_connector_registry.py (3 hunks)
  • airbyte/mcp/_local_ops.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte/mcp/_cloud_ops.py
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte/mcp/_connector_registry.py (2)
airbyte/mcp/_util.py (3)
  • resolve_list_of_strings (75-75)
  • resolve_list_of_strings (80-80)
  • resolve_list_of_strings (83-126)
airbyte/sources/registry.py (1)
  • get_available_connectors (225-281)
airbyte/mcp/_local_ops.py (2)
airbyte/mcp/_util.py (3)
  • resolve_list_of_strings (75-75)
  • resolve_list_of_strings (80-80)
  • resolve_list_of_strings (83-126)
airbyte/_executors/docker.py (1)
  • ensure_installation (36-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/mcp/_connector_registry.py (1)

61-63: Nice: normalization removes the single-string iteration pitfall.

resolve_list_of_strings avoids the prior per-character iteration on bare str inputs. LGTM.

@aaronsteers aaronsteers merged commit 1d9292f into main Sep 18, 2025
23 checks passed
@aaronsteers aaronsteers deleted the aj/fix/misc-msp-fixes branch September 18, 2025 21:15
@aaronsteers aaronsteers changed the title fix: misc MCP files (from Davin feedback) fix: misc MCP fixes (from Davin feedback) Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant