Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions airbyte/_executors/python.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,13 @@ def __init__(
created. If not provided, the current working directory will be used.
"""
super().__init__(name=name, metadata=metadata, target_version=target_version)

if not pip_url and metadata and not metadata.pypi_package_name:
raise exc.AirbyteConnectorNotPyPiPublishedError(
connector_name=self.name,
context={
"metadata": metadata,
},
)
# if not pip_url and metadata and not metadata.pypi_package_name:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java connectors don't have a pip url

# raise exc.AirbyteConnectorNotPyPiPublishedError(
# connector_name=self.name,
# context={
# "metadata": metadata,
# },
# )
Comment on lines +47 to +53
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Don’t globally disable the PyPI publication guard; gate it for your test scenario.

Commenting this out makes non-PyPI installs silently “valid” for everyone. Could we keep the check and allow bypass only when explicitly opted-in (e.g., via an env var), wdyt?

Apply this diff:

-        # if not pip_url and metadata and not metadata.pypi_package_name:
-        #     raise exc.AirbyteConnectorNotPyPiPublishedError(
-        #         connector_name=self.name,
-        #         context={
-        #             "metadata": metadata,
-        #         },
-        #     )
+        allow_non_pypi = bool(os.getenv("PYAIRBYTE_ALLOW_NON_PYPI"))
+        if not allow_non_pypi and not pip_url and metadata and not metadata.pypi_package_name:
+            raise exc.AirbyteConnectorNotPyPiPublishedError(
+                connector_name=self.name,
+                context={"metadata": metadata},
+            )

Add this import near the other imports:

import os

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Showing file: airbyte/_executors/python.py ==="
nl -ba airbyte/_executors/python.py || true

echo
echo "=== Searching for related symbols across the repo ==="
rg --line-number "AirbyteConnectorNotPyPiPublishedError" || true
rg --line-number "pypi_package_name" || true
rg --line-number "pip_url" || true
rg --line-number "PYAIRBYTE_ALLOW_NON_PYPI" || true
rg --line-number "allow_non_pypi" || true
rg --line-number "import os" || true
rg --line-number "os.getenv" || true

Length of output: 8191


Gate the PyPI publication guard behind an explicit opt-in (env var)

The commented-out guard in airbyte/_executors/python.py makes non‑PyPI installs silently allowed for everyone — re-enable it but allow bypass only when explicitly opted into via an env var.

  • Files to update:
    • airbyte/_executors/python.py — commented guard around lines ~47–57.
    • Related: airbyte/_executors/util.py (line ~244) uses metadata.pypi_package_name to infer pip_url, so the guard protects downstream behavior.

Apply this diff (adds a robust env check and import):

+import os
@@
-        # if not pip_url and metadata and not metadata.pypi_package_name:
-        #     raise exc.AirbyteConnectorNotPyPiPublishedError(
-        #         connector_name=self.name,
-        #         context={
-        #             "metadata": metadata,
-        #         },
-        #     )
+        allow_non_pypi = os.getenv("PYAIRBYTE_ALLOW_NON_PYPI", "false").lower() in ("1", "true", "yes")
+        if not allow_non_pypi and not pip_url and metadata and not metadata.pypi_package_name:
+            raise exc.AirbyteConnectorNotPyPiPublishedError(
+                connector_name=self.name,
+                context={"metadata": metadata},
+            )

Could we reintroduce this guard exactly like above (env var opt-in), wdyt?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# if not pip_url and metadata and not metadata.pypi_package_name:
# raise exc.AirbyteConnectorNotPyPiPublishedError(
# connector_name=self.name,
# context={
# "metadata": metadata,
# },
# )
import os
allow_non_pypi = os.getenv("PYAIRBYTE_ALLOW_NON_PYPI", "false").lower() in ("1", "true", "yes")
if not allow_non_pypi and not pip_url and metadata and not metadata.pypi_package_name:
raise exc.AirbyteConnectorNotPyPiPublishedError(
connector_name=self.name,
context={"metadata": metadata},
)
🤖 Prompt for AI Agents
In airbyte/_executors/python.py around lines 47 to 57, re-enable the commented
PyPI publication guard so connectors without a pip_url and without
metadata.pypi_package_name raise AirbyteConnectorNotPyPiPublishedError, but
allow bypass only when an explicit environment opt-in is set (e.g.,
AIRBYTE_ALLOW_NON_PYPI=true); import os at the top if missing, check the env var
case-insensitively (truthy values like "1"/"true"/"yes"), and if bypass is not
enabled then raise
exc.AirbyteConnectorNotPyPiPublishedError(connector_name=self.name,
context={"metadata": metadata}); ensure this guard is consistent with
airbyte/_executors/util.py which relies on metadata.pypi_package_name.


self.pip_url = pip_url or (
metadata.pypi_package_name
Expand Down Expand Up @@ -117,10 +116,15 @@ def install(self) -> None:
file=sys.stderr,
)
try:
self._run_subprocess_and_raise_on_failure(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just hardcoding the connector I want to test

args=[pip_path, "install","--no-cache-dir" , f"https://storage.googleapis.com/pyairbyte-java-connectors-python-package-jose-test/airbyte_{self.name.replace('-', '_')}/airbyte_{self.name.replace('-', '_')}-0.0.1-py3-none-any.whl"]
)

self._run_subprocess_and_raise_on_failure(
args=[pip_path, "install", *shlex.split(self.pip_url)]
)
except exc.AirbyteSubprocessFailedError as ex:
pass
# If the installation failed, remove the virtual environment
# Otherwise, the connector will be considered as installed and the user may not be able
# to retry the installation.
Expand Down
6 changes: 2 additions & 4 deletions airbyte/_executors/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -232,19 +232,17 @@ def get_connector_executor( # noqa: PLR0912, PLR0913, PLR0914, PLR0915, C901 #

if install_method_count == 0:
# User has not specified how to install the connector.
# Prefer local executable if found, then manifests, then python, then docker, depending upon
# Prefer local executable if found, then manifests, then python, depending upon
# how the connector is declared in the connector registry.
if which(name):
local_executable = name
elif metadata and metadata.install_types:
match metadata.default_install_type:
case InstallType.YAML:
source_manifest = True
case InstallType.PYTHON:
case _:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python is the fallback now.

pip_url = metadata.pypi_package_name
pip_url = f"{pip_url}=={version}" if version else pip_url
case _:
docker_image = True

Comment on lines +235 to 246
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Defaulting non-YAML connectors to Python likely breaks DOCKER/JAVA defaults; please honor registry defaults

Switching the catch-all branch to Python (case _) ignores metadata.default_install_type for DOCKER/JAVA and will try pip for connectors that aren’t published on PyPI, causing unexpected failures and a breaking behavior change. The comment at Line 235 also says “depending upon how the connector is declared,” but the implementation no longer does that. Can we restore explicit handling for PYTHON vs DOCKER/JAVA, and only fall back when appropriate, wdyt?

Impact to check:

  • Connectors whose metadata.default_install_type is DOCKER or JAVA will now attempt Python install.
  • Backward compatibility for callers that relied on implicit DOCKER fallback without passing docker_image.

Proposed fix:

-        # Prefer local executable if found, then manifests, then python, depending upon
+        # Prefer local executable if found, then manifests, then honor the registry default install type
         # how the connector is declared in the connector registry.
         if which(name):
             local_executable = name
         elif metadata and metadata.install_types:
             match metadata.default_install_type:
                 case InstallType.YAML:
                     source_manifest = True
-                case _:
-                    pip_url = metadata.pypi_package_name
-                    pip_url = f"{pip_url}=={version}" if version else pip_url
+                case InstallType.PYTHON:
+                    pip_url = metadata.pypi_package_name
+                    pip_url = f"{pip_url}=={version}" if version else pip_url
+                case InstallType.DOCKER | InstallType.JAVA:
+                    docker_image = True
+                case _:
+                    # Fallback to Docker to avoid surprising Python installs for unknown/legacy types
+                    docker_image = True

Would you like me to add a small unit test to assert we don’t route DOCKER/JAVA connectors through the Python path, wdyt?


Could we verify the affected snippet and confirm the full set of InstallType values by running the following scripts, wdyt?


🏁 Script executed:

#!/bin/bash
set -eux

# Locate the match on default_install_type in util.py
rg -n "match metadata.default_install_type" -n airbyte/_executors/util.py

# Find where InstallType is defined to see all possible types
rg -n "class InstallType" -n airbyte

Length of output: 239


Could we confirm all the possible InstallType variants to ensure we handle each explicitly? For example, could you run:

wdyt?


🏁 Script executed:

#!/bin/bash
set -eux

# Show the InstallType enum and its members
rg -A5 -n "class InstallType" -n airbyte/sources/registry.py

Length of output: 236


Explicitly handle PYTHON vs DOCKER install types

The current catch-all routes every non-YAML connector through pip, which will wrongly pip-install Docker connectors. Can we update the match to only pip-install PYTHON connectors and send DOCKER connectors down the docker path, wdyt?

-        match metadata.default_install_type:
-            case InstallType.YAML:
-                source_manifest = True
-            case _:
-                pip_url = metadata.pypi_package_name
-                pip_url = f"{pip_url}=={version}" if version else pip_url
+        match metadata.default_install_type:
+            case InstallType.YAML:
+                source_manifest = True
+            case InstallType.PYTHON:
+                pip_url = metadata.pypi_package_name
+                pip_url = f"{pip_url}=={version}" if version else pip_url
+            case InstallType.DOCKER:
+                docker_image = True
+            case _:
+                # Fallback to Docker for any unknown/legacy type
+                docker_image = True

Would you also like a small unit test to assert that Docker connectors never hit the Python path, wdyt?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Prefer local executable if found, then manifests, then python, depending upon
# how the connector is declared in the connector registry.
if which(name):
local_executable = name
elif metadata and metadata.install_types:
match metadata.default_install_type:
case InstallType.YAML:
source_manifest = True
case InstallType.PYTHON:
case _:
pip_url = metadata.pypi_package_name
pip_url = f"{pip_url}=={version}" if version else pip_url
case _:
docker_image = True
# Prefer local executable if found, then manifests, then python, depending upon
# how the connector is declared in the connector registry.
if which(name):
local_executable = name
elif metadata and metadata.install_types:
match metadata.default_install_type:
case InstallType.YAML:
source_manifest = True
case InstallType.PYTHON:
pip_url = metadata.pypi_package_name
pip_url = f"{pip_url}=={version}" if version else pip_url
case InstallType.DOCKER:
docker_image = True
case _:
# Fallback to Docker for any unknown/legacy type
docker_image = True

if local_executable:
return _get_local_executor(
Expand Down
2 changes: 1 addition & 1 deletion airbyte/destinations/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ def _write_airbyte_message_stream(
with as_temp_files(
files_contents=[
self._hydrated_config,
catalog_provider.configured_catalog.model_dump_json(),
catalog_provider.configured_catalog.model_dump_json(exclude_none=True),
]
) as [
config_file,
Expand Down