-
Notifications
You must be signed in to change notification settings - Fork 67
feat: split config parameter into config and config_file for better MCP client interoperability #762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…CP client interoperability - Expand resolve_config() to accept config as dict or JSON string - Add separate config_file parameter for file paths - Update all MCP tools to use new parameter structure - Maintain backward compatibility with existing usage - Improve interoperability across different MCP clients Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1735261427-split-config-parameter' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1735261427-split-config-parameter' Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
Addresses GitHub comment from @aaronsteers to move import statement to the top of the file for better code organization. Co-Authored-By: AJ Steers <[email protected]>
📝 WalkthroughWalkthroughPublic MCP operation signatures now accept dict or JSON string configs and add a config_file parameter. resolve_config was extended to load configs from file (YAML/JSON), parse JSON strings, merge file -> config -> secret, detect hardcoded secrets, and optionally load a .env via an env var (AIRBYTE_MCP_DOTENV_PATH_ENVVAR). Operations pass config_file through to resolve_config. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant Ops as _local_ops.py<br/>MCP Operation
participant Util as _util.py<br/>resolve_config
participant FS as File System
participant Secrets as Secret Store
participant Env as .env loader
Caller->>Ops: call op(config, config_file, config_secret_name, ...)
Ops->>Util: resolve_config(config, config_file, config_secret_name, schema)
rect rgba(220,240,255,0.35)
alt config_file provided
Util->>FS: read config_file (YAML/JSON)
FS-->>Util: file_config (dict) or error
Util->>Util: validate file yields dict or raise TypeError
else no config_file
Note over Util: skip file load
end
end
alt dotenv env var set
Util->>Env: _load_dotenv_file(path)
Env-->>Util: env vars loaded or error
else no dotenv
Note over Util: skip dotenv
end
alt config is JSON string
Util->>Util: parse JSON -> dict or raise ValueError
else config is dict / None
Note over Util: use as-is
end
Util->>Util: merge file_config -> config
opt config_secret_name provided
Util->>Secrets: load secret config
Secrets-->>Util: secret_config (dict)
Util->>Util: deep_update(main, secret_config)
end
Util->>Util: check hardcoded secrets (if schema) -> raise ValueError if found
Util-->>Ops: resolved_config (dict)
Ops-->>Caller: result using resolved_config
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Would you like adding unit tests for JSON-string parsing errors, file-vs-config-vs-secret precedence, and dotenv loading behavior, wdyt? ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
airbyte/mcp/_local_ops.py (3)
27-44
: Update CONFIG_HELP to reflect newconfig
vsconfig_file
split and precedenceThe help text still says “config as JSON or a Path”, which conflicts with the new API. Shall we update and note precedence, wdyt?
-You can provide `config` as JSON or a Path to a YAML/JSON file. -If a `dict` is provided, it must not contain hardcoded secrets. +You can provide `config` as a dict or JSON string. For file paths, use `config_file` (YAML/JSON). +If a dict is provided, it must not contain hardcoded secrets. @@ -If both `config` and `config_secret_name` are provided, the -`config` will be loaded first and then the referenced secret config -will be layered on top of the non-secret config. +If multiple sources are provided, precedence is: +`config_file` < `config` < `config_secret_name`. +The non-secret config is loaded first and then the referenced secret config +is layered on top.
233-236
: Bug:config_file
not passed to resolve_config (also causes ARG001)This breaks the new API and triggers the linter. Shall we pass it through, wdyt?
- config_dict = resolve_config( - config=config, - config_secret_name=config_secret_name, - config_spec_jsonschema=source.config_spec, - ) + config_dict = resolve_config( + config=config, + config_file=config_file, + config_secret_name=config_secret_name, + config_spec_jsonschema=source.config_spec, + )
1-539
: Ensureconfig_file
is passed toresolve_config
inread_source_stream_records
I ran a quick search across all
resolve_config
call sites and every tool correctly includesconfig_file
—except the one inread_source_stream_records
, which only passesconfig
andconfig_secret_name
. This means users can’t supply a--config-file
when invoking that tool.• File:
airbyte/mcp/_local_ops.py
Function:read_source_stream_records
Call site (~lines 232–234):config_dict = resolve_config( config=config, config_secret_name=config_secret_name, )• Suggestion: Add both
config_file=config_file
(and, for completeness,config_spec_jsonschema=source.config_spec
) so it’s consistent with the other tools. For example:config_dict = resolve_config( config=config, + config_file=config_file, config_secret_name=config_secret_name, + config_spec_jsonschema=source.config_spec, )Would you mind updating that call site so
--config-file
works as expected? wdyt?
🧹 Nitpick comments (3)
airbyte/mcp/_util.py (2)
46-51
: Docstring: make precedence explicit and align exceptions with codeCan we clarify precedence (
config_file < config < config_secret_name
) and list both ValueError/TypeError in Raises to match behavior, wdyt?-"""Resolve a configuration dictionary, JSON string, or file path to a dictionary. +"""Resolve configuration from a dict, JSON string, or file into a dictionary. @@ - Returns: - Resolved configuration dictionary + Precedence: + config_file < config < config_secret_name + + Returns: + Resolved configuration dictionary. @@ - Raises: - ValueError: If no configuration provided or if JSON parsing fails + Raises: + ValueError: If no configuration provided, parsing fails, or file/secret content is invalid. + TypeError: If parameters are of an invalid type. """Also applies to: 52-65
46-46
: Reduce branching to satisfy PLR0912Ruff flags “Too many branches (17 > 15)”. Would you consider extracting helpers to cut branches, wdyt?
Example (outside selected range):
def _load_file_config(config_file: Path) -> dict[str, Any]: ... def _parse_config_param(config: dict | str) -> dict[str, Any]: ... def _merge_secret(config_dict: dict[str, Any], secret_name: str) -> dict[str, Any]: ...airbyte/mcp/_local_ops.py (1)
16-44
: Reduce duplication for param descriptions across toolsWould you like to factor the repeated description strings into shared constants to keep things DRY and avoid future E501 regressions, wdyt?
Example (outside selected ranges):
CONFIG_DESC = "The configuration for the source connector as a dict object or JSON string." CONFIG_FILE_DESC = "Path to a YAML or JSON file containing the source connector configuration."Then reference with Field(description=CONFIG_DESC) / Field(description=CONFIG_FILE_DESC).
Also applies to: 520-539
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
airbyte/mcp/_local_ops.py
(11 hunks)airbyte/mcp/_util.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/_util.py (1)
airbyte/secrets/hydration.py (1)
detect_hardcoded_secrets
(140-171)
🪛 GitHub Actions: Run Linters
airbyte/mcp/_util.py
[error] 46-46: Ruff: PLR0912 Too many branches (17 > 15) in function 'resolve_config'.
[error] 68-68: Ruff: PLC0415 import
should be at the top-level of a file.
[error] 69-69: Ruff: W293 Blank line contains whitespace.
[error] 71-71: Ruff: W293 Blank line contains whitespace.
[error] 81-81: Ruff: W293 Blank line contains whitespace.
[error] 89-89: Ruff: W293 Blank line contains whitespace.
[error] 93-93: Ruff: TRY004 Prefer TypeError
exception for invalid type.
[error] 93-93: Ruff: TRY301 Abstract raise
to an inner function.
[error] 99-99: Ruff: B904 Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling.
[error] 106-106: Ruff: TRY004 Prefer TypeError
exception for invalid type.
[error] 109-109: Ruff: E501 Line too long (105 > 100).
[error] 113-113: Ruff: B904 Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling.
[error] 113-113: Ruff: TRY301 Abstract raise
to an inner function.
airbyte/mcp/_local_ops.py
[error] 131-131: Ruff: E501 Line too long (105 > 100).
[error] 135-135: Ruff: E501 Line too long (104 > 100).
[error] 172-172: Ruff: E501 Line too long (105 > 100).
[error] 176-176: Ruff: E501 Line too long (104 > 100).
[error] 206-206: Ruff: E501 Line too long (105 > 100).
[error] 208-208: Ruff: ARG001 Unused function argument: config_file
.
[error] 210-210: Ruff: E501 Line too long (104 > 100).
[error] 264-264: Ruff: E501 Line too long (105 > 100).
[error] 268-268: Ruff: E501 Line too long (104 > 100).
[error] 345-345: Ruff: E501 Line too long (105 > 100).
[error] 349-349: Ruff: E501 Line too long (104 > 100).
[error] 351-351: Ruff: E501 Line too long (105 > 100).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (6)
airbyte/mcp/_util.py (1)
119-136
: Hardcoded secret detection: looks solidThe check runs only when schema is provided and before secret overlay—nice layering.
airbyte/mcp/_local_ops.py (5)
78-85
: Propagation ofconfig_file
looks correct hereThanks for threading it through to resolve_config.
150-156
: resolve_config propagation looks goodconfig_file is passed and schema provided—LGTM.
188-193
: resolve_config propagation looks goodGood consistency with other ops.
296-301
: resolve_config propagation looks goodConsistent and correct.
366-370
: resolve_config propagation looks goodMatches new API expectations.
- Shortened Field descriptions to fix E501 line length violations - Added missing config_file parameter to resolve_config call in read_source_stream_records - Fixed unused function argument ARG001 error Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
airbyte/mcp/_util.py (3)
4-4
: Thanks for hoistingjson
to module scope.This resolves the prior PLC0415 warning. LGTM.
77-99
: File handling: use TypeError for type mismatches, specify UTF-8, narrow exceptions, chain errors, and deep-mergeThis addresses Ruff TRY004/B904 and aligns merge semantics with
deep_update
. Also preferis_file()
to avoid directories. Wdyt?- if config_file is not None: - if isinstance(config_file, str): - config_file = Path(config_file) - - if not isinstance(config_file, Path): - raise ValueError( - f"config_file must be a string or Path object, got: {type(config_file).__name__}" - ) - - if not config_file.exists(): - raise FileNotFoundError(f"Configuration file not found: {config_file}") - - try: - file_config = yaml.safe_load(config_file.read_text()) - if not isinstance(file_config, dict): - raise ValueError( - f"Configuration file must contain a valid JSON/YAML object, " - f"got: {type(file_config).__name__}" - ) - config_dict.update(file_config) - except Exception as e: - raise ValueError(f"Error reading configuration file {config_file}: {e}") + if config_file is not None: + config_path = Path(config_file) if isinstance(config_file, (str, Path)) else None + if config_path is None: + raise TypeError(f"config_file must be str or Path, got: {type(config_file).__name__}") + if not (config_path.exists() and config_path.is_file()): + raise FileNotFoundError(f"Configuration file not found: {config_path}") + try: + text = config_path.read_text(encoding="utf-8") + file_config = yaml.safe_load(text) + if not isinstance(file_config, dict): + raise TypeError( + f"Configuration file must contain a JSON/YAML object, got: {type(file_config).__name__}" + ) + deep_update(config_dict, file_config) + except (OSError, UnicodeDecodeError, yaml.YAMLError) as e: + raise ValueError(f"Error reading configuration file {config_path}") from e
100-116
: Inline config: deep-merge dicts; TypeError for type issues; chain JSON errorsThis fixes TRY004/B904 and unifies merge behavior with secrets. Wdyt?
- if config is not None: - if isinstance(config, dict): - config_dict.update(config) - elif isinstance(config, str): - try: - parsed_config = json.loads(config) - if not isinstance(parsed_config, dict): - raise ValueError( - f"Parsed JSON config must be an object/dict, " - f"got: {type(parsed_config).__name__}" - ) - config_dict.update(parsed_config) - except json.JSONDecodeError as e: - raise ValueError(f"Invalid JSON in config parameter: {e}") - else: - raise ValueError(f"Config must be a dict or JSON string, got: {type(config).__name__}") + if config is not None: + if isinstance(config, dict): + deep_update(config_dict, config) + elif isinstance(config, str): + try: + parsed_config = json.loads(config) + except json.JSONDecodeError as e: + raise ValueError("Invalid JSON in config parameter") from e + if not isinstance(parsed_config, dict): + raise TypeError(f"Parsed JSON config must be a dict, got: {type(parsed_config).__name__}") + deep_update(config_dict, parsed_config) + else: + raise TypeError(f"config must be dict or JSON string, got: {type(config).__name__}")
🧹 Nitpick comments (3)
airbyte/mcp/_util.py (3)
117-133
: Hardcoded-secrets check should run after secrets merge to avoid false positivesIf a hardcoded value is later overridden by
config_secret_name
, the current ordering still raises. Shall we move this block to after the secret merge, wdyt?- if config_dict and config_spec_jsonschema is not None: - hardcoded_secrets: list[list[str]] = detect_hardcoded_secrets( - config=config_dict, - spec_json_schema=config_spec_jsonschema, - ) - if hardcoded_secrets: - error_msg = "Configuration contains hardcoded secrets in fields: " - error_msg += ", ".join( - [".".join(hardcoded_secret) for hardcoded_secret in hardcoded_secrets] - ) - - error_msg += ( - "Please use environment variables instead. For example:\n" - "To set a secret via reference, set its value to " - "`secret_reference::ENV_VAR_NAME`.\n" - ) - raise ValueError(error_msg)Add this after merging secrets (just before
return
):# Run after merging secrets so overrides can suppress false positives. if config_dict and config_spec_jsonschema is not None: hardcoded = detect_hardcoded_secrets( config=config_dict, spec_json_schema=config_spec_jsonschema, ) if hardcoded: fields = ", ".join(".".join(p) for p in hardcoded) raise ValueError( "Configuration contains hardcoded secrets in fields: " f"{fields}. Use secret_reference::ENV_VAR_NAME or environment variables." )
135-149
: Secret parsing: narrow exceptions, chain errors, and prefer TypeError for non-dictThis improves diagnostics and consistency with the rest of the function. Wdyt?
- if config_secret_name is not None: - # Assume this is a secret name that points to a JSON/YAML config. - secret_config = yaml.safe_load(str(get_secret(config_secret_name))) - if not isinstance(secret_config, dict): - raise ValueError( - f"Secret '{config_secret_name}' must contain a valid JSON or YAML object, " - f"but got: {type(secret_config).__name__}" - ) + if config_secret_name is not None: + # Assume this is a secret name that points to a JSON/YAML config. + try: + secret_text = str(get_secret(config_secret_name)) + secret_config = yaml.safe_load(secret_text) + except yaml.YAMLError as e: + raise ValueError(f"Error parsing secret '{config_secret_name}' as YAML/JSON") from e + if not isinstance(secret_config, dict): + raise TypeError( + f"Secret '{config_secret_name}' must contain a JSON/YAML object, " + f"got: {type(secret_config).__name__}" + )
53-66
: Docstring: reflect TypeError cases and priority semanticsSince we’ll raise TypeError for type mismatches, and the merge priority is
config_file < config < config_secret_name
, shall we update the docstring “Raises” and “Args” sections accordingly, wdyt?- """Resolve a configuration dictionary, JSON string, or file path to a dictionary. + """Resolve a configuration dictionary, JSON string, or file path to a dictionary. @@ - Raises: - ValueError: If no configuration provided or if JSON parsing fails + Raises: + ValueError: If no configuration is provided, or file/JSON parsing fails. + TypeError: If inputs are of invalid types (e.g., non-dict JSON object).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
airbyte/mcp/_local_ops.py
(12 hunks)airbyte/mcp/_util.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte/mcp/_local_ops.py
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/_util.py (1)
airbyte/secrets/hydration.py (1)
detect_hardcoded_secrets
(140-171)
🪛 GitHub Actions: Run Linters
airbyte/mcp/_util.py
[error] 47-47: Ruff: PLR0912 Too many branches (17 > 15).
[error] 92-92: Ruff: TRY004 Prefer TypeError exception for invalid type.
[error] 92-92: Ruff: TRY301 Abstract raise
to an inner function.
[error] 98-98: Ruff: B904 Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
.
[error] 107-107: Ruff: TRY004 Prefer TypeError exception for invalid type.
[error] 113-113: Ruff: B904 Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
airbyte/mcp/_util.py (2)
71-75
: Early guard is solid.Clear error when no config source is provided. LGTM.
48-66
: Please verify allresolve_config
call-sites and update the docstring accordinglyI ran a scan for any direct
Path(...)
instances being passed into theconfig
parameter but didn’t spot any matches—could you double-check that every call-site now routes file paths throughconfig_file
instead ofconfig
? In particular, take a look at these invocations inairbyte/mcp/_local_ops.py
around lines 79, 150, 188, 232, 297, and 366.Also, since the implementation now raises a
TypeError
when inputs are of the wrong kind, should we tweak the docstring’s “Raises” section to reflect that? For example:Raises: ValueError: If no configuration provided or if JSON parsing fails + TypeError: If the provided config or config_file is of an unexpected type
Quick linter thought: would you like to locally suppress PLR0912 on
resolve_config
to unblock CI while we refactor the branching structure? For example:def resolve_config( # noqa: PLR0912 config: dict | str | None = None, config_file: str | Path | None = None, config_secret_name: str | None = None, config_spec_jsonschema: dict[str, Any] | None = None, ) -> dict[str, Any]: ...Wdyt?
- Use TypeError instead of ValueError for type validation (TRY004) - Add proper exception chaining with 'from err' (B904) - Add noqa comment for function complexity (PLR0912) - Abstract raise to inner function to reduce complexity (TRY301) - Use object type instead of Any to avoid ANN401 Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
airbyte/mcp/_util.py (4)
4-4
: Import hoisted to module scope — lint issue resolved.This addresses the prior PLC0415 note; thanks for moving it.
85-93
: Use TypeError for invalidconfig_file
type and tighten message.Invalid-type guards typically raise TypeError; switch from ValueError and keep the message concise, wdyt?
- if not isinstance(config_file, Path): - raise ValueError( - f"config_file must be a string or Path object, got: {type(config_file).__name__}" - ) + if not isinstance(config_file, Path): + raise TypeError(f"config_file must be str or Path, got: {type(config_file).__name__}")
94-104
: File load: specify UTF-8, catch narrow exceptions, and deep-merge.
- Use read_text(encoding="utf-8") to avoid locale surprises.
- Catch (OSError, UnicodeError, yaml.YAMLError) instead of bare Exception.
- Prefer deep_update to avoid clobbering nested dicts, wdyt?
- try: - file_config = yaml.safe_load(config_file.read_text()) + try: + text = config_file.read_text(encoding="utf-8") + file_config = yaml.safe_load(text) if not isinstance(file_config, dict): - _raise_invalid_file_config_type(file_config) - config_dict.update(file_config) - except Exception as e: - raise ValueError(f"Error reading configuration file {config_file}: {e}") from e + _raise_invalid_obj_type(file_config, "Configuration file content") + deep_update(config_dict, file_config) + except (OSError, UnicodeError, yaml.YAMLError) as e: + raise ValueError(f"Error reading configuration file {config_file}") from e
105-121
: Merge semantics and type errors forconfig
.
- Use deep_update for dict merges to preserve nested keys.
- Raise TypeError for invalid top-level type.
- Keep JSON error concise and chained, wdyt?
if config is not None: if isinstance(config, dict): - config_dict.update(config) + deep_update(config_dict, config) elif isinstance(config, str): try: parsed_config = json.loads(config) if not isinstance(parsed_config, dict): raise TypeError( f"Parsed JSON config must be an object/dict, " f"got: {type(parsed_config).__name__}" ) - config_dict.update(parsed_config) + deep_update(config_dict, parsed_config) except json.JSONDecodeError as e: - raise ValueError(f"Invalid JSON in config parameter: {e}") from e + raise ValueError("Invalid JSON in config parameter") from e else: - raise ValueError(f"Config must be a dict or JSON string, got: {type(config).__name__}") + raise TypeError(f"config must be dict or JSON string, got: {type(config).__name__}")
🧹 Nitpick comments (5)
airbyte/mcp/_util.py (5)
17-23
: Broaden helper to cover non-file contexts or adjust message.Since this raises TypeError for “file config” shape, the message reads oddly when reused conceptually for non-file contexts; would you generalize it to accept a context string (e.g., "file", "json") or tighten usage to file-only and keep other sites bespoke, wdyt?
-def _raise_invalid_file_config_type(file_config: object) -> None: - """Raise TypeError for invalid file config type.""" - raise TypeError( - f"Configuration file must contain a valid JSON/YAML object, " - f"got: {type(file_config).__name__}" - ) +def _raise_invalid_obj_type(obj: object, where: str) -> None: + """Raise TypeError for invalid object type.""" + raise TypeError(f"{where} must be a JSON/YAML object (dict), got: {type(obj).__name__}")
38-44
: Comment mismatches behavior: no CWD .env load.The comment states loading from CWD, but code only loads when AIRBYTE_MCP_ENV_FILE is set. Update the comment or add a CWD fallback (e.g., load ".env" if present), wdyt?
- # Load the .env file from the current working directory. + # Load environment variables from a .env file if AIRBYTE_MCP_ENV_FILE is set.
55-75
: Docstring: enumerate all raised exceptions.resolve_config can raise FileNotFoundError and TypeError in addition to ValueError. Shall we document these for callers, wdyt?
- Raises: - ValueError: If no configuration provided or if JSON parsing fails + Raises: + ValueError: If no configuration provided, file parse fails, or JSON parsing fails + FileNotFoundError: If the provided config_file path does not exist + TypeError: If provided types are invalid or parsed content is not a dictWould you like me to add a brief “Exceptions” section to the README for MCP tool authors?
122-139
: Consider checking for hardcoded secrets even without a schema.detect_hardcoded_secrets falls back to a global mask when schema is None; would you run the check whenever config_dict is non-empty to keep protection consistent, wdyt?
- if config_dict and config_spec_jsonschema is not None: + if config_dict: hardcoded_secrets: list[list[str]] = detect_hardcoded_secrets( config=config_dict, spec_json_schema=config_spec_jsonschema, )
55-71
: Add focused tests for new inputs and precedence?Shall we add unit tests covering: JSON string parsing (valid/invalid), deep merge behavior (file < config < secret), type guards (TypeError cases), and hardcoded-secret detection with/without schema, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
airbyte/mcp/_util.py
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/_util.py (1)
airbyte/secrets/hydration.py (1)
detect_hardcoded_secrets
(140-171)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/mcp/_util.py (1)
149-153
: Secret merge priority LGTM.deep_update for secrets at highest precedence matches the stated priority.
… docstring - Remove _raise_invalid_file_config_type helper function and inline the logic - Remove Args block from resolve_config function docstring - Change behavior to return empty dict instead of raising error when no config provided - Addresses comments from @aaronsteers on PR #762 Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte/mcp/_util.py (1)
17-23
: Inline single-use helper to reduce indirection?
_raise_invalid_file_config_type()
is only used once; inlining trims a hop and keeps the error local. Wdyt?- def _raise_invalid_file_config_type(file_config: object) -> None: - """Raise TypeError for invalid file config type.""" - raise TypeError( - f"Configuration file must contain a valid JSON/YAML object, " - f"got: {type(file_config).__name__}" - ) + # (Helper removed — inline where used) @@ - if not isinstance(file_config, dict): - _raise_invalid_file_config_type(file_config) + if not isinstance(file_config, dict): + raise TypeError( + f"Configuration file must contain a JSON/YAML object, got: {type(file_config).__name__}" + )Also applies to: 93-95
🧹 Nitpick comments (6)
airbyte/mcp/_util.py (6)
28-36
: UsePath.is_file()
for a tighter existence check
exists()
allows directories;is_file()
is stricter and improves diagnostics. Wdyt?- if not dotenv_path.exists(): - raise FileNotFoundError(f".env file not found: {dotenv_path}") + if not dotenv_path.is_file(): + raise FileNotFoundError(f".env file not found or not a file: {dotenv_path}")
38-44
: Docstring/comment mismatch with behaviorThis only loads a .env when
AIRBYTE_MCP_ENV_FILE
is set (not the CWD). Shall we align the docstring/comments (or add a fallbackdotenv.load_dotenv()
), wdyt?-def initialize_secrets() -> None: - """Initialize dotenv to load environment variables from .env files.""" - # Load the .env file from the current working directory. +def initialize_secrets() -> None: + """Initialize dotenv from AIRBYTE_MCP_ENV_FILE (if set), then register secret managers.""" + # Load the .env file from AIRBYTE_MCP_ENV_FILE if present.
55-60
: Expand docstring to enumerate raised exceptionsCallers will benefit from the full contract (TypeError, FileNotFoundError, etc.). Add-on below, wdyt?
-"""Resolve a configuration dictionary, JSON string, or file path to a dictionary. +"""Resolve configuration from dict, JSON string, or file path into a dict. @@ -Returns: - Resolved configuration dictionary +Returns: + Resolved configuration dictionary. @@ -Raises: - ValueError: If no configuration provided or if JSON parsing fails +Raises: + ValueError: If no configuration provided or JSON/file parsing fails. + FileNotFoundError: If config_file does not exist or is not a file. + TypeError: If inputs are wrong types or parsed content is not a dict.Also applies to: 61-70
79-87
: Type validation: preferTypeError
; simplify Path handlingMinor polish: validate type, then coerce once. Wdyt?
- if config_file is not None: - if isinstance(config_file, str): - config_file = Path(config_file) - - if not isinstance(config_file, Path): - raise ValueError( - f"config_file must be a string or Path object, got: {type(config_file).__name__}" - ) + if config_file is not None: + if not isinstance(config_file, (str, Path)): + raise TypeError(f"config_file must be str or Path, got: {type(config_file).__name__}") + config_file = Path(config_file)
88-98
: File load: specify UTF-8, narrow exceptions, deep-merge
- Use
encoding="utf-8"
.- Catch
OSError/UnicodeDecodeError/yaml.YAMLError
instead of a blanketException
.- Prefer
deep_update
to avoid clobbering nested keys. Wdyt?- if not config_file.exists(): - raise FileNotFoundError(f"Configuration file not found: {config_file}") + if not config_file.is_file(): + raise FileNotFoundError(f"Configuration file not found or not a file: {config_file}") @@ - try: - file_config = yaml.safe_load(config_file.read_text()) + try: + text = config_file.read_text(encoding="utf-8") + file_config = yaml.safe_load(text) if not isinstance(file_config, dict): - _raise_invalid_file_config_type(file_config) - config_dict.update(file_config) - except Exception as e: - raise ValueError(f"Error reading configuration file {config_file}: {e}") from e + raise TypeError( + f"Configuration file must contain a JSON/YAML object, got: {type(file_config).__name__}" + ) + deep_update(config_dict, file_config) + except (OSError, UnicodeDecodeError, yaml.YAMLError) as e: + raise ValueError(f"Error reading configuration file {config_file}") from e
99-115
: Config merging: deep-merge dicts; chain JSON errors;TypeError
for wrong typesThis avoids nested key loss and tightens error types. Wdyt?
- if config is not None: - if isinstance(config, dict): - config_dict.update(config) - elif isinstance(config, str): - try: - parsed_config = json.loads(config) - if not isinstance(parsed_config, dict): - raise TypeError( - f"Parsed JSON config must be an object/dict, " - f"got: {type(parsed_config).__name__}" - ) - config_dict.update(parsed_config) - except json.JSONDecodeError as e: - raise ValueError(f"Invalid JSON in config parameter: {e}") from e - else: - raise ValueError(f"Config must be a dict or JSON string, got: {type(config).__name__}") + if config is not None: + if isinstance(config, dict): + deep_update(config_dict, config) + elif isinstance(config, str): + try: + parsed_config = json.loads(config) + except json.JSONDecodeError as e: + raise ValueError("Invalid JSON in config parameter") from e + if not isinstance(parsed_config, dict): + raise TypeError(f"Parsed JSON config must be a dict, got: {type(parsed_config).__name__}") + deep_update(config_dict, parsed_config) + else: + raise TypeError(f"config must be dict or JSON string, got: {type(config).__name__}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
airbyte/mcp/_util.py
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/_util.py (1)
airbyte/secrets/hydration.py (1)
detect_hardcoded_secrets
(140-171)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (4)
airbyte/mcp/_util.py (4)
4-4
: Hoistingjson
import — LGTMThanks for moving this to module scope.
116-133
: Hardcoded-secrets detection UX — LGTMClear guidance and field paths are helpful. Nice.
143-147
: Priority order preserved via deep merge — LGTMMerging secret config last honors
config_file < config < secret
. Good.
55-60
: Allresolve_config
usages & signatures are up-to-date
I’ve checked the code inairbyte/mcp/_local_ops.py
and confirmed that:
- Every call to
resolve_config
now passes bothconfig
andconfig_file
arguments.- None of the callers use a
Path
in theconfig
parameter.- All MCP tool function signatures (
validate_connector_config
,list_source_streams
, etc.) include the newconfig_file
parameter.Looks like there’s no signature or usage drift here—thanks for handling the update! Anything else you’d like me to double-check? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte/mcp/_util.py (1)
80-85
: Inline the single-use helper to reduce noiseThis inner helper is only used once; inlining keeps things compact (and you’ll still satisfy TRY301 elsewhere). Shall we remove it, wdyt?
-def _raise_invalid_type(file_config: object) -> None: - raise TypeError( - f"Configuration file must contain a valid JSON/YAML object, " - f"got: {type(file_config).__name__}" - )
🧹 Nitpick comments (6)
airbyte/mcp/_util.py (6)
47-52
: Signature/types: tightenconfig
typing and docstring mention ofconfig_file
Would you switch
config: dict | str | None
todict[str, Any] | str | None
for parity with the return type and update the docstring to explicitly mentionconfig_file
and its accepted types, wdyt?-def resolve_config( # noqa: PLR0912 - config: dict | str | None = None, - config_file: str | Path | None = None, +def resolve_config( # noqa: PLR0912 + config: dict[str, Any] | str | None = None, + config_file: str | Path | None = None,
53-60
: Docstring mismatch with behavior and exceptionsThe function returns
{}
when nothing is provided (Line 66) but the docstring says it raises on “no configuration provided”. Also, wrong-type errors below useValueError
in a few places. Could we align the docstring to current behavior and noteTypeError
for invalid types, wdyt?-"""Resolve a configuration dictionary, JSON string, or file path to a dictionary. +"""Resolve configuration from a dict, JSON string, and/or file path into a dictionary. @@ -Returns: - Resolved configuration dictionary +Returns: + Resolved configuration dictionary. Returns `{}` if no inputs are provided. @@ -Raises: - ValueError: If no configuration provided or if JSON parsing fails +Raises: + ValueError: If file read/parse fails or JSON parsing fails. + TypeError: If inputs are of invalid types (e.g., non-dict JSON content or wrong param types).
68-76
: Type validation: preferTypeError
and simplify normalizationFor wrong types,
TypeError
is more appropriate (TRY004). Also,Path(config_file)
covers bothstr
andPath
. Shall we streamline like this, wdyt?-if config_file is not None: - if isinstance(config_file, str): - config_file = Path(config_file) - - if not isinstance(config_file, Path): - raise ValueError( - f"config_file must be a string or Path object, got: {type(config_file).__name__}" - ) +if config_file is not None: + try: + config_file = Path(config_file) # accepts str or Path + except TypeError as e: + raise TypeError(f"config_file must be str or Path, got: {type(config_file).__name__}") from e
86-93
: File load: specify UTF-8, catch specific exceptions, deep-merge
- Use
encoding="utf-8"
for deterministic decoding.- Catch
OSError|UnicodeError|yaml.YAMLError
instead of a blanketException
.- Prefer
deep_update
over shallowdict.update
to avoid clobbering nested keys (aligns with secret merge semantics).Would you apply the below, wdyt?
-try: - file_config = yaml.safe_load(config_file.read_text()) - if not isinstance(file_config, dict): - _raise_invalid_type(file_config) - config_dict.update(file_config) -except Exception as e: - raise ValueError(f"Error reading configuration file {config_file}: {e}") from e +try: + text = config_file.read_text(encoding="utf-8") + file_config = yaml.safe_load(text) + if not isinstance(file_config, dict): + raise TypeError(f"Configuration file must contain a JSON/YAML object, got: {type(file_config).__name__}") + deep_update(config_dict, file_config) +except (OSError, UnicodeError, yaml.YAMLError) as e: + raise ValueError(f"Error reading configuration file {config_file}: {e}") from e
94-110
: Config merging: deep-merge dicts and adjust exception types/messages
- Use
deep_update
whenconfig
is a dict or JSON-parses to a dict.- Raise
TypeError
for invalid types (TRY004).- Keep JSON error chained; message can be concise to avoid E501.
Shall we update as below, wdyt?
-if isinstance(config, dict): - config_dict.update(config) +if isinstance(config, dict): + deep_update(config_dict, config) elif isinstance(config, str): try: parsed_config = json.loads(config) if not isinstance(parsed_config, dict): - raise TypeError( - f"Parsed JSON config must be an object/dict, " - f"got: {type(parsed_config).__name__}" - ) - config_dict.update(parsed_config) + raise TypeError(f"Parsed JSON config must be a dict, got: {type(parsed_config).__name__}") + deep_update(config_dict, parsed_config) except json.JSONDecodeError as e: - raise ValueError(f"Invalid JSON in config parameter: {e}") from e + raise ValueError("Invalid JSON in config parameter") from e else: - raise ValueError(f"Config must be a dict or JSON string, got: {type(config).__name__}") + raise TypeError(f"config must be dict or JSON string, got: {type(config).__name__}")
122-127
: Minor copy tweak in guidance (optional)Would you add an example env var name to make this more concrete (e.g.,
secret_reference::MY_API_KEY
), wdyt?- "To set a secret via reference, set its value to " - "`secret_reference::ENV_VAR_NAME`.\n" + "To set a secret via reference, set its value to " + "`secret_reference::ENV_VAR_NAME` (e.g., `secret_reference::MY_API_KEY`).\n"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
airbyte/mcp/_util.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/_util.py (1)
airbyte/secrets/hydration.py (1)
detect_hardcoded_secrets
(140-171)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (6)
airbyte/mcp/_util.py (6)
4-4
: Good fix: movedjson
import to module scopeThis addresses prior Ruff PLC0415; nice cleanup.
65-67
: Graceful empty-config behavior looks goodEarly return
{}
on no inputs matches the PR intent for backward-safe defaults.
77-79
: Existence check is fine
FileNotFoundError
on missing path is clear and actionable.
111-121
: Hardcoded secrets check: timing and message look goodRunning after merge of file/config but before secrets gives the right signal to users; LGTM.
129-144
: Secret merge viadeep_update
is correctThis preserves “secrets override everything” precedence. Nice.
47-110
: No file-pathconfig
usages detectedI ran the ripgrep scans against the provided patterns and didn’t find:
- Any calls to
resolve_config
passing aPath(...)
or.yaml
/.json
string via itsconfig=
parameter- Any definitions or invocations of the old MCP ops (e.g.
validate_connector_config
,list_source_streams
, etc.) still accepting only a singleconfig
argumentIt looks like all consumers have migrated to the new
config_file
parameter (or pass a dict/JSON string toconfig
). Let me know if you spot any positional calls or other edge cases we should verify, wdyt?
feat: split config parameter into config and config_file for better MCP client interoperability
Summary
This PR addresses MCP client interoperability issues by splitting the
config
parameter into two separate parameters:config
: Now acceptsdict | str | None
(dict objects OR JSON strings)config_file
: New parameter acceptingstr | Path | None
for file pathsThe changes expand the
resolve_config()
utility function to parse JSON strings and handle the new parameter structure, then update all 6 MCP tools (validate_connector_config
,list_source_streams
,get_source_stream_json_schema
,read_source_stream_records
,get_stream_previews
,sync_source_to_cache
) to use the new signature.Key motivation: Different MCP clients (like Claude Desktop) may serialize JSON objects as strings, causing type validation errors with the previous
dict | Path | None
typing. This change allows both programmatic clients (passing dict objects) and user-facing clients (passing JSON strings) to work seamlessly.Review & Testing Checklist for Human
config
parameter correctly handles JSON strings like'{"count": 100}'
across all MCP toolsconfig_file
parameter works correctly with YAML/JSON file pathsNotes
config
parameter no longer acceptsPath
objects directly - these should now useconfig_file
insteadSummary by CodeRabbit
New Features
Bug Fixes / Validation
Documentation
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.