-
Notifications
You must be signed in to change notification settings - Fork 67
[DO NOT MERGE] testing java connectors python packages #747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
"metadata": metadata, | ||
}, | ||
) | ||
# if not pip_url and metadata and not metadata.pypi_package_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java connectors don't have a pip url
file=sys.stderr, | ||
) | ||
try: | ||
self._run_subprocess_and_raise_on_failure( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just hardcoding the connector I want to test
case InstallType.YAML: | ||
source_manifest = True | ||
case InstallType.PYTHON: | ||
case _: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python is the fallback now.
📝 WalkthroughWalkthroughSuppresses PyPI publication validation in the Python executor, adds a pre-install wheel step and swallows its failures, changes default executor selection to Python when not YAML or explicitly Docker, and omits None fields when dumping the configured catalog JSON for destinations. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant VenvExecutor
participant Pip
Caller->>VenvExecutor: install()
Note over VenvExecutor: (New) Pre-install remote wheel via pip --no-cache-dir
VenvExecutor->>Pip: pip install <remote-wheel-url> --no-cache-dir
Pip-->>VenvExecutor: success/failure
Note over VenvExecutor: failure is swallowed (pass)
VenvExecutor->>Pip: pip install <pip_url or resolved package>
Pip-->>VenvExecutor: success/failure
VenvExecutor-->>Caller: return (install errors from pre-step not propagated)
sequenceDiagram
participant Caller
participant get_connector_executor
Caller->>get_connector_executor: resolve install method
alt YAML manifest provided
get_connector_executor-->>Caller: Manifest-based executor
else Docker image explicitly provided
get_connector_executor-->>Caller: Docker executor
else
Note over get_connector_executor: Default to Python executor (no implicit Docker fallback)
get_connector_executor-->>Caller: Python executor
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Would you like me to draft a small patch that makes the pre-install failure behavior configurable (raise vs swallow) so teams can opt into stricter failure handling, wdyt? Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
airbyte/_executors/python.py (2)
47-53
: Removing the PyPI-publication guard may degrade UX; consider a non-fatal warning insteadDisabling the exception here means we’ll discover “not on PyPI” only via a later pip failure, which is noisier and less actionable. Would you consider emitting a warning (instead of raising) so users still get clear feedback, wdyt?
Possible tweak:
- # if not pip_url and metadata and not metadata.pypi_package_name: - # raise exc.AirbyteConnectorNotPyPiPublishedError( - # connector_name=self.name, - # context={ - # "metadata": metadata, - # }, - # ) + # Warn if we’re about to attempt an install for a connector that isn’t on PyPI + if not pip_url and metadata and not metadata.pypi_package_name: + print( + f"Warning: Connector '{self.name}' is not published on PyPI; " + f"will attempt fallback install (e.g., 'airbyte-{self.name}').", + file=sys.stderr, + )
127-127
: Redundant pass inside exception blockThe pass is a no-op and may confuse readers into thinking the exception is swallowed; it isn’t (it’s re-raised later). Shall we remove it, wdyt?
- pass
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/_executors/python.py
(2 hunks)airbyte/_executors/util.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
airbyte/_executors/util.py (3)
airbyte/_util/meta.py (1)
which
(154-165)airbyte/_connector_base.py (1)
name
(84-86)airbyte/sources/registry.py (2)
default_install_type
(79-88)InstallType
(40-46)
airbyte/_executors/python.py (2)
airbyte/validate.py (1)
_run_subprocess_and_raise_on_failure
(47-58)airbyte/exceptions.py (1)
AirbyteSubprocessFailedError
(274-277)
try: | ||
self._run_subprocess_and_raise_on_failure( | ||
args=[pip_path, "install","--no-cache-dir" , "https://storage.googleapis.com/pyairbyte-java-connectors-python-package-jose-test/airbyte_destination_dev_null/airbyte_destination_dev_null-0.0.1-py3-none-any.whl"] | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Hard-coded remote wheel install is unsafe, non-deterministic, and breaks isolation
Unconditionally installing a specific wheel from a personal GCS bucket:
- Installs an unrelated package for every connector (destination_dev_null), potentially polluting envs and causing conflicts.
- Introduces supply chain and reproducibility risks (no pin by hash, no provenance).
- Ignores offline mode and registry intent (e.g., DOCKER/JAVA connectors) and will fail installs if the URL is unavailable.
Could we remove this or gate it behind an explicit opt-in (e.g., an env var) and scope it to the specific use case (e.g., only for JAVA connectors), wdyt?
Proposed minimal, safe gating (plus no-deps):
- self._run_subprocess_and_raise_on_failure(
- args=[pip_path, "install","--no-cache-dir" , "https://storage.googleapis.com/pyairbyte-java-connectors-python-package-jose-test/airbyte_destination_dev_null/airbyte_destination_dev_null-0.0.1-py3-none-any.whl"]
- )
+ preinstall_url = os.getenv("PYAIRBYTE_PREINSTALL_WHEEL_URL")
+ if preinstall_url:
+ # Optional pre-install hook for experimentation; caller must provide a URL
+ # Consider requiring a hash: preinstall_url += f"#sha256={expected_hash}"
+ self._run_subprocess_and_raise_on_failure(
+ args=[pip_path, "install", "--no-cache-dir", "--no-deps", preinstall_url],
+ )
Additionally, please add this import at the top of the file:
import os
If the intent is to bootstrap a Python wrapper for JAVA connectors, we could restrict this to those cases only (e.g., when metadata and InstallType.JAVA in metadata.install_types). I can draft that variant too if helpful, wdyt?
# Prefer local executable if found, then manifests, then python, depending upon | ||
# how the connector is declared in the connector registry. | ||
if which(name): | ||
local_executable = name | ||
elif metadata and metadata.install_types: | ||
match metadata.default_install_type: | ||
case InstallType.YAML: | ||
source_manifest = True | ||
case InstallType.PYTHON: | ||
case _: | ||
pip_url = metadata.pypi_package_name | ||
pip_url = f"{pip_url}=={version}" if version else pip_url | ||
case _: | ||
docker_image = True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Defaulting non-YAML connectors to Python likely breaks DOCKER/JAVA defaults; please honor registry defaults
Switching the catch-all branch to Python (case _) ignores metadata.default_install_type for DOCKER/JAVA and will try pip for connectors that aren’t published on PyPI, causing unexpected failures and a breaking behavior change. The comment at Line 235 also says “depending upon how the connector is declared,” but the implementation no longer does that. Can we restore explicit handling for PYTHON vs DOCKER/JAVA, and only fall back when appropriate, wdyt?
Impact to check:
- Connectors whose metadata.default_install_type is DOCKER or JAVA will now attempt Python install.
- Backward compatibility for callers that relied on implicit DOCKER fallback without passing docker_image.
Proposed fix:
- # Prefer local executable if found, then manifests, then python, depending upon
+ # Prefer local executable if found, then manifests, then honor the registry default install type
# how the connector is declared in the connector registry.
if which(name):
local_executable = name
elif metadata and metadata.install_types:
match metadata.default_install_type:
case InstallType.YAML:
source_manifest = True
- case _:
- pip_url = metadata.pypi_package_name
- pip_url = f"{pip_url}=={version}" if version else pip_url
+ case InstallType.PYTHON:
+ pip_url = metadata.pypi_package_name
+ pip_url = f"{pip_url}=={version}" if version else pip_url
+ case InstallType.DOCKER | InstallType.JAVA:
+ docker_image = True
+ case _:
+ # Fallback to Docker to avoid surprising Python installs for unknown/legacy types
+ docker_image = True
Would you like me to add a small unit test to assert we don’t route DOCKER/JAVA connectors through the Python path, wdyt?
Could we verify the affected snippet and confirm the full set of InstallType
values by running the following scripts, wdyt?
🏁 Script executed:
#!/bin/bash
set -eux
# Locate the match on default_install_type in util.py
rg -n "match metadata.default_install_type" -n airbyte/_executors/util.py
# Find where InstallType is defined to see all possible types
rg -n "class InstallType" -n airbyte
Length of output: 239
Could we confirm all the possible InstallType
variants to ensure we handle each explicitly? For example, could you run:
wdyt?
🏁 Script executed:
#!/bin/bash
set -eux
# Show the InstallType enum and its members
rg -A5 -n "class InstallType" -n airbyte/sources/registry.py
Length of output: 236
Explicitly handle PYTHON vs DOCKER install types
The current catch-all routes every non-YAML connector through pip, which will wrongly pip-install Docker connectors. Can we update the match
to only pip-install PYTHON connectors and send DOCKER connectors down the docker path, wdyt?
- match metadata.default_install_type:
- case InstallType.YAML:
- source_manifest = True
- case _:
- pip_url = metadata.pypi_package_name
- pip_url = f"{pip_url}=={version}" if version else pip_url
+ match metadata.default_install_type:
+ case InstallType.YAML:
+ source_manifest = True
+ case InstallType.PYTHON:
+ pip_url = metadata.pypi_package_name
+ pip_url = f"{pip_url}=={version}" if version else pip_url
+ case InstallType.DOCKER:
+ docker_image = True
+ case _:
+ # Fallback to Docker for any unknown/legacy type
+ docker_image = True
Would you also like a small unit test to assert that Docker connectors never hit the Python path, wdyt?
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Prefer local executable if found, then manifests, then python, depending upon | |
# how the connector is declared in the connector registry. | |
if which(name): | |
local_executable = name | |
elif metadata and metadata.install_types: | |
match metadata.default_install_type: | |
case InstallType.YAML: | |
source_manifest = True | |
case InstallType.PYTHON: | |
case _: | |
pip_url = metadata.pypi_package_name | |
pip_url = f"{pip_url}=={version}" if version else pip_url | |
case _: | |
docker_image = True | |
# Prefer local executable if found, then manifests, then python, depending upon | |
# how the connector is declared in the connector registry. | |
if which(name): | |
local_executable = name | |
elif metadata and metadata.install_types: | |
match metadata.default_install_type: | |
case InstallType.YAML: | |
source_manifest = True | |
case InstallType.PYTHON: | |
pip_url = metadata.pypi_package_name | |
pip_url = f"{pip_url}=={version}" if version else pip_url | |
case InstallType.DOCKER: | |
docker_image = True | |
case _: | |
# Fallback to Docker for any unknown/legacy type | |
docker_image = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
airbyte/_executors/python.py (2)
112-116
: Use “python -m pip” from the venv for cross-platform reliability.On Windows, “pip” may be “pip.exe” and this path logic can break. Invoking pip via the venv’s interpreter is robust. Shall we switch to that, wdyt?
- pip_path = str(get_bin_dir(self._get_venv_path()) / "pip") - print( - f"Installing '{self.name}' into virtual environment '{self._get_venv_path()!s}'.\n" - f"Running 'pip install {self.pip_url}'...\n", - file=sys.stderr, - ) + # Prefer invoking pip via the venv's interpreter for portability + pip_cmd: list[str] = [str(self.interpreter_path), "-m", "pip"] + print( + f"Installing '{self.name}' into virtual environment '{self._get_venv_path()!s}'.\n" + f"Running 'pip install {self.pip_url}'...\n", + file=sys.stderr, + )
126-135
: “pass” doesn’t swallow the failure; refine error handling and add failure telemetry.Contrary to the summary, the install failure isn’t swallowed (you re-raise). The bare
pass
is a no-op and can be removed. If the intent is to ignore only the optional preinstall failure (not the main install), could we narrow the try/except to the preinstall step and keep failure for the main install while logging telemetry on failure, wdyt?- except exc.AirbyteSubprocessFailedError as ex: - pass - # If the installation failed, remove the virtual environment - # Otherwise, the connector will be considered as installed and the user may not be able - # to retry the installation. - with suppress(exc.AirbyteSubprocessFailedError): - self.uninstall() - - raise exc.AirbyteConnectorInstallationError from ex + except exc.AirbyteSubprocessFailedError as ex: + # Main install failed: report, cleanup venv, then raise + log_install_state(self.name, state=EventState.FAILED) + with suppress(exc.AirbyteSubprocessFailedError): + self.uninstall() + raise exc.AirbyteConnectorInstallationError from ex
♻️ Duplicate comments (1)
airbyte/_executors/python.py (1)
119-121
: Hard-coded remote wheel install introduces supply-chain and reproducibility risks.Fetching a wheel from a personal GCS bucket for every connector is unsafe, non-deterministic, and leaks the test-specific behavior into all installs. Could we make this optional via an env var and scope it to your testing only, wdyt?
Replace the hard-coded preinstall with an optional, no-deps preinstall that the tester opts into:
- self._run_subprocess_and_raise_on_failure( - args=[pip_path, "install","--no-cache-dir" , f"https://storage.googleapis.com/pyairbyte-java-connectors-python-package-jose-test/airbyte_{self.name.replace('-', '_')}/airbyte_{self.name.replace('-', '_')}-0.0.1-py3-none-any.whl"] - ) + preinstall_url = os.getenv("PYAIRBYTE_PREINSTALL_WHEEL_URL") + if preinstall_url: + try: + # Optional preinstall: used for experimentation only. + # Consider requiring a pin by hash: preinstall_url += "#sha256=<expected_hash>" + self._run_subprocess_and_raise_on_failure( + args=[*pip_cmd, "install", "--no-cache-dir", "--no-deps", preinstall_url], + ) + except exc.AirbyteSubprocessFailedError as pre_err: + # Best-effort: continue with the main install + print( + f"[warning] Optional preinstall failed (exit {pre_err.exit_code}). Continuing without it.", + file=sys.stderr, + )Note: Please also add
import os
at the top of the file if not already present.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/_executors/python.py
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte/_executors/python.py (3)
airbyte/validate.py (1)
_run_subprocess_and_raise_on_failure
(47-58)airbyte/_connector_base.py (1)
name
(84-86)airbyte/exceptions.py (1)
AirbyteSubprocessFailedError
(274-277)
# if not pip_url and metadata and not metadata.pypi_package_name: | ||
# raise exc.AirbyteConnectorNotPyPiPublishedError( | ||
# connector_name=self.name, | ||
# context={ | ||
# "metadata": metadata, | ||
# }, | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Don’t globally disable the PyPI publication guard; gate it for your test scenario.
Commenting this out makes non-PyPI installs silently “valid” for everyone. Could we keep the check and allow bypass only when explicitly opted-in (e.g., via an env var), wdyt?
Apply this diff:
- # if not pip_url and metadata and not metadata.pypi_package_name:
- # raise exc.AirbyteConnectorNotPyPiPublishedError(
- # connector_name=self.name,
- # context={
- # "metadata": metadata,
- # },
- # )
+ allow_non_pypi = bool(os.getenv("PYAIRBYTE_ALLOW_NON_PYPI"))
+ if not allow_non_pypi and not pip_url and metadata and not metadata.pypi_package_name:
+ raise exc.AirbyteConnectorNotPyPiPublishedError(
+ connector_name=self.name,
+ context={"metadata": metadata},
+ )
Add this import near the other imports:
import os
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Showing file: airbyte/_executors/python.py ==="
nl -ba airbyte/_executors/python.py || true
echo
echo "=== Searching for related symbols across the repo ==="
rg --line-number "AirbyteConnectorNotPyPiPublishedError" || true
rg --line-number "pypi_package_name" || true
rg --line-number "pip_url" || true
rg --line-number "PYAIRBYTE_ALLOW_NON_PYPI" || true
rg --line-number "allow_non_pypi" || true
rg --line-number "import os" || true
rg --line-number "os.getenv" || true
Length of output: 8191
Gate the PyPI publication guard behind an explicit opt-in (env var)
The commented-out guard in airbyte/_executors/python.py makes non‑PyPI installs silently allowed for everyone — re-enable it but allow bypass only when explicitly opted into via an env var.
- Files to update:
- airbyte/_executors/python.py — commented guard around lines ~47–57.
- Related: airbyte/_executors/util.py (line ~244) uses metadata.pypi_package_name to infer pip_url, so the guard protects downstream behavior.
Apply this diff (adds a robust env check and import):
+import os
@@
- # if not pip_url and metadata and not metadata.pypi_package_name:
- # raise exc.AirbyteConnectorNotPyPiPublishedError(
- # connector_name=self.name,
- # context={
- # "metadata": metadata,
- # },
- # )
+ allow_non_pypi = os.getenv("PYAIRBYTE_ALLOW_NON_PYPI", "false").lower() in ("1", "true", "yes")
+ if not allow_non_pypi and not pip_url and metadata and not metadata.pypi_package_name:
+ raise exc.AirbyteConnectorNotPyPiPublishedError(
+ connector_name=self.name,
+ context={"metadata": metadata},
+ )
Could we reintroduce this guard exactly like above (env var opt-in), wdyt?
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# if not pip_url and metadata and not metadata.pypi_package_name: | |
# raise exc.AirbyteConnectorNotPyPiPublishedError( | |
# connector_name=self.name, | |
# context={ | |
# "metadata": metadata, | |
# }, | |
# ) | |
import os | |
allow_non_pypi = os.getenv("PYAIRBYTE_ALLOW_NON_PYPI", "false").lower() in ("1", "true", "yes") | |
if not allow_non_pypi and not pip_url and metadata and not metadata.pypi_package_name: | |
raise exc.AirbyteConnectorNotPyPiPublishedError( | |
connector_name=self.name, | |
context={"metadata": metadata}, | |
) |
🤖 Prompt for AI Agents
In airbyte/_executors/python.py around lines 47 to 57, re-enable the commented
PyPI publication guard so connectors without a pip_url and without
metadata.pypi_package_name raise AirbyteConnectorNotPyPiPublishedError, but
allow bypass only when an explicit environment opt-in is set (e.g.,
AIRBYTE_ALLOW_NON_PYPI=true); import os at the top if missing, check the env var
case-insensitively (truthy values like "1"/"true"/"yes"), and if bypass is not
enabled then raise
exc.AirbyteConnectorNotPyPiPublishedError(connector_name=self.name,
context={"metadata": metadata}); ensure this guard is consistent with
airbyte/_executors/util.py which relies on metadata.pypi_package_name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte/destinations/base.py (1)
268-269
: Excluding None fields from catalog JSON: verify cross-connector compatibility and consider a feature flag, wdyt?Switching to
exclude_none=True
(Line 268) changes the wire contract for destinations. Many connectors will accept omitted optional fields, but some (especially older ones) may expect explicitnull
values to be present (e.g.,namespace
,primary_key
), and could fail or behave differently when fields are missing rather thannull
.Would you consider gating this behind a feature flag so we can toggle behavior per environment while testing Java connectors, and keep the default conservative for general use? wdyt?
Apply this targeted change:
- catalog_provider.configured_catalog.model_dump_json(exclude_none=True), + catalog_provider.configured_catalog.model_dump_json( + exclude_none=_EXCLUDE_NONE_IN_CATALOG + ),And add this support code near the imports:
import os _EXCLUDE_NONE_IN_CATALOG = os.getenv("AB_EXCLUDE_NONE_IN_CATALOG", "true").lower() in ( "1", "true", "yes", "y", )Follow-ups:
- Can you confirm that the set of destinations you’re testing accept omitted fields and don’t rely on explicit
null
values?- If we keep this change, do you want the default for
AB_EXCLUDE_NONE_IN_CATALOG
to remaintrue
only in this testing branch, and default tofalse
for stable releases, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (1)
airbyte/destinations/base.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte/destinations/base.py (3)
airbyte/shared/sql_processor.py (1)
catalog_provider
(215-231)airbyte/sources/base.py (1)
configured_catalog
(384-400)airbyte/shared/catalog_providers.py (1)
configured_catalog
(70-72)
I've been using these change to test airbytehq/airbyte#64901
Summary by CodeRabbit
New Features
Chores