-
Notifications
You must be signed in to change notification settings - Fork 71
feat(registry): Add KeyError validation for externalDocumentationUrls #863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add validation in ApiDocsUrl.from_manifest_dict to raise PyAirbyteInputError when 'title' or 'url' fields are missing - Add validation in _extract_docs_from_registry for registry entries - Add unit tests for missing field scenarios - Error messages clearly indicate which required field is missing Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1762978720-add-keyerror-validation' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1762978720-add-keyerror-validation'Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
📝 WalkthroughWalkthroughReplaces manifest-based ingestion with metadata-based ingestion: adds metadata.yaml URL fetching and parsing, new ApiDocsUrl.from_metadata_* helpers, stricter validation that raises PyAirbyteInternalError for missing required fields, renames public API to get_connector_docs_urls, and updates tests and URL helpers accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant MCP as mcp.connector_registry
participant Registry as airbyte.registry
participant ApiDocs as ApiDocsUrl
Caller->>MCP: get_connector_docs_urls(connector_name)
MCP->>Registry: _get_connector_docs_urls(connector_name)
Registry->>Registry: check connector exists (dockers)
alt connector not found
Registry-->>Caller: raise AirbyteConnectorNotRegisteredError
else connector found
Registry->>Registry: _metadata_url_for(connector_name)
Registry->>Registry: _fetch_metadata_dict(metadata_url)
Registry->>Registry: ApiDocsUrl.from_metadata_dict(metadata_dict)
Registry->>ApiDocs: ApiDocsUrl.from_metadata_docs_list(docs, source, context="Registry")
alt missing required fields
ApiDocs-->>Registry: raise PyAirbyteInternalError
Registry-->>Caller: propagate PyAirbyteInternalError
else valid metadata
ApiDocs-->>Registry: return list[ApiDocsUrl] (registry entries tagged doc_type="internal")
Registry-->>Caller: return combined list[ApiDocsUrl]
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Would you like a short deprecation shim for the old Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PyTest Results (Fast Tests Only, No Creds)321 tests +1 321 ✅ +1 5m 54s ⏱️ ±0s Results for commit 8af8085. ± Comparison against base commit 7eb746b. This pull request removes 8 and adds 9 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
- Rename get_connector_api_docs_urls to get_connector_docs_urls in registry.py - Rename MCP tool from get_api_docs_urls to get_connector_docs_urls - Update internal Airbyte docs to use doc_type='internal' instead of default 'other' - Update all tests to use new function names - This is a breaking change but acceptable since the feature was just released today Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte/registry.py (1)
416-423: Add test coverage for the newdoc_type="internal"value.The change at line 421 introduces a new
doc_type="internal"value for registrydocumentationUrlentries. This semantic distinction makes sense (internal Airbyte docs vs. external API docs), but the code path at lines 416-423 lacks test coverage.Currently, tests verify other
doc_typevalues like "api_reference", "api_release_history", and "api_deprecations" inTestApiDocsUrlFromManifestDict, but the registrydocumentationUrlpath is untested. Consider adding a test case toTestApiDocsUrlFromManifestDictthat assertsdoc_type=="internal"whenApiDocsUrlis constructed with the registry source—or verify if this path is tested indirectly elsewhere. Wdyt?
🧹 Nitpick comments (2)
airbyte/registry.py (2)
333-347: Solid validation implementation!The try/except pattern correctly catches KeyError on required fields and raises a clear
PyAirbyteInputError. The exception chaining withfrom epreserves the stack trace nicely.One thought: the error message could be even more helpful with additional context like
f"Manifest parsing error for {manifest_data.get('metadata', {}).get('name', 'unknown')}: missing required field {e}". But I see from the PR objectives that improving error messages is noted for future iteration, so this is totally fine as-is. wdyt about deferring that enhancement?
428-442: Validation logic mirrors manifest parsing nicely!The try/except pattern here is consistent with the manifest parsing logic (lines 333-347), which is great for maintainability.
However, I noticed from the PR objectives that "registry parsing path uses the same validation pattern but isn't explicitly covered by new tests." The new tests (
test_manifest_missing_title_raises_error,test_manifest_missing_url_raises_error) only cover the manifest path viaApiDocsUrl.from_manifest_dict().Should we add equivalent tests for
_extract_docs_from_registry()to ensure both validation paths are covered? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte/mcp/connector_registry.py(3 hunks)airbyte/registry.py(3 hunks)tests/unit_tests/test_mcp_connector_registry.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit_tests/test_mcp_connector_registry.py
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte/registry.py (2)
airbyte/exceptions.py (1)
PyAirbyteInputError(201-210)airbyte/mcp/connector_registry.py (1)
get_connector_docs_urls(183-203)
airbyte/mcp/connector_registry.py (1)
airbyte/registry.py (1)
get_connector_docs_urls(447-489)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (5)
airbyte/mcp/connector_registry.py (3)
27-27: Clean import aliasing!The underscore-prefixed alias makes it clear this is for internal routing. Nice pattern for keeping the MCP tool layer independent from registry naming.
183-199: LGTM on the rename and docstring updates!The shift from "API documentation" to just "documentation" better reflects that this retrieves all connector documentation URLs (Airbyte docs + external docs), not just API-specific docs.
200-203: Should validation errors be caught here too?The new validation logic in
_get_connector_docs_urls()raisesPyAirbyteInputErrorwhen required fields are missing (lines 345-347, 440-442 in registry.py). Currently, onlyAirbyteConnectorNotRegisteredErroris caught and converted to a friendly "Connector not found." message.Should
PyAirbyteInputErroralso be caught here to return a user-friendly message like "Invalid connector documentation format" instead of letting it propagate? Or is it intentional to let validation errors bubble up to expose data quality issues? wdyt?airbyte/registry.py (2)
323-326: Good documentation of the new exception!Documenting the
PyAirbyteInputErrorin the docstring helps API consumers understand when validation will fail.
397-399: Consistent documentation across parsing functions!Good to see the same error documentation pattern applied to both
from_manifest_dict()and_extract_docs_from_registry().
…ication_of_urls - Updated error messages to include the malformed doc dict for better debugging - Fixed test_deduplication_of_urls to properly mock get_available_connectors - Test now correctly verifies URL deduplication functionality Co-Authored-By: AJ Steers <[email protected]>
Co-Authored-By: AJ Steers <[email protected]>
- Removed deduplication logic from get_connector_docs_urls function - Removed test_deduplication_of_urls test - Updated docstring to remove mention of deduplication The deduplication feature was previously introduced in PR #858 but is being removed per product direction. Duplicate URLs from registry and manifest sources will now be preserved in the returned list. Co-Authored-By: AJ Steers <[email protected]>
|
Removed URL deduplication logic and the associated test per user request. The deduplication feature was previously introduced in PR #858 but is being removed per product direction. Duplicate URLs from registry and manifest sources will now be preserved in the returned list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte/registry.py (1)
398-442: Add tests for registry parsing error handling to match manifest coverageThe test coverage gap is valid—while
ApiDocsUrl.from_manifest_dict()has dedicated tests for missing 'title' and 'url' fields, the parallel registry parsing path in_extract_docs_from_registry()(lines 440-442) lacks equivalent error tests. Both raisePyAirbyteInputErrorconsistently, but only manifest errors are verified.Would you be open to adding tests like
test_registry_missing_title_raises_errorandtest_registry_missing_url_raises_errorthat mock the registry response with malformedexternalDocumentationUrlsentries? This would ensure both paths have parity in error coverage and catch regressions on either side.On the exception type point: the pattern mirrors your manifest implementation, so if that was reviewed and approved, this should be consistent. But if there's concern about
PyAirbyteInputErrorbeing misleading for external data failures (since users can't fix registry issues), that's worth flagging separately—wdyt?
♻️ Duplicate comments (2)
airbyte/registry.py (2)
447-458: Update the docstring to match the function rename?Line 447 renames the function to
get_connector_docs_urls, but line 448's docstring still says "Get API documentation URLs for a connector." A past review comment suggested updating this to "Get documentation URLs for a connector" (or otherwise remove "API") for consistency with the new name. wdyt?Apply this diff:
def get_connector_docs_urls(connector_name: str) -> list[ApiDocsUrl]: - """Get API documentation URLs for a connector. + """Get documentation URLs for a connector. - This function retrieves documentation URLs for a connector's upstream API from multiple sources: + This function retrieves documentation URLs for a connector from multiple sources:Based on past review comment at lines 447-448.
333-347: Reconsider the exception type for malformed manifest and registry data—should they bePyAirbyteInternalErrorinstead?The manifest and registry data parsing errors (lines 345 and 440) currently raise
PyAirbyteInputError, but that exception is defined as "input provided to PyAirbyte did not match expected validation rules." However, manifest data and registry data are external data sources—not user inputs—so raisingPyAirbyteInputErroris semantically incorrect.These are data quality issues external to the library, which aligns better with
PyAirbyteInternalError("An internal error occurred in PyAirbyte"). This is also consistent with line 215, which already usesPyAirbyteInternalErrorfor a registry loading failure. Would it make sense to switch both parsing errors to usePyAirbyteInternalErrorfor consistency and accuracy?
🧹 Nitpick comments (1)
tests/unit_tests/test_mcp_connector_registry.py (1)
131-168: Nice test coverage for manifest parsing errors, but consider adding registry path tests?These tests properly verify that
PyAirbyteInputErroris raised when manifest entries are missing required fields. However, the registry parsing path in_extract_docs_from_registry()(lines 428-442 in registry.py) uses the same validation pattern but isn't explicitly covered by tests. Would it be valuable to add similar error tests for the registry path? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/registry.py(5 hunks)tests/unit_tests/test_mcp_connector_registry.py(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2024-10-06T23:44:31.534Z
Learnt from: aaronsteers
Repo: airbytehq/PyAirbyte PR: 411
File: airbyte/cli.py:111-160
Timestamp: 2024-10-06T23:44:31.534Z
Learning: In PyAirbyte, error messages in functions like `_resolve_source_job` in `airbyte/cli.py` are designed to decouple the message text from dynamic values, following a structlog-inspired design. Dynamic values are provided via parameters like `input_value`. This approach helps avoid including PII in the message strings, which may be used in telemetry.
Applied to files:
airbyte/registry.py
📚 Learning: 2024-10-08T15:34:31.026Z
Learnt from: aaronsteers
Repo: airbytehq/PyAirbyte PR: 347
File: tests/integration_tests/fixtures/registry.json:48-48
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Test fixtures in the PyAirbyte project do not need to align with real Docker repositories.
Applied to files:
tests/unit_tests/test_mcp_connector_registry.py
🧬 Code graph analysis (2)
airbyte/registry.py (2)
airbyte/exceptions.py (1)
PyAirbyteInputError(201-210)airbyte/mcp/connector_registry.py (1)
get_connector_docs_urls(183-203)
tests/unit_tests/test_mcp_connector_registry.py (3)
airbyte/registry.py (3)
get_connector_docs_urls(447-482)ApiDocsUrl(303-349)from_manifest_dict(315-349)airbyte/mcp/connector_registry.py (1)
get_connector_docs_urls(183-203)airbyte/exceptions.py (2)
PyAirbyteInputError(201-210)AirbyteConnectorNotRegisteredError(288-297)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte/registry.py (2)
421-421: Good addition of doc_type="internal" for registry documentation!Tagging the Airbyte Documentation entry with
doc_type="internal"helps distinguish it from external API docs. Nice touch!
482-482: LGTM on removing deduplication!Returning the combined list directly simplifies the logic. If deduplication was needed, it could be handled by the caller.
tests/unit_tests/test_mcp_connector_registry.py (1)
170-184: LGTM on test renames and updates!The test class rename and updated patch target/function calls correctly reflect the API change from
get_api_docs_urlstoget_connector_docs_urls.
…nalError Address PR feedback from r2519730767: Registry/manifest parsing errors should raise PyAirbyteInternalError instead of PyAirbyteInputError because malformed registry/manifest data is not a user input issue - it's an internal data issue that users cannot resolve themselves. Changes: - Updated ApiDocsUrl.from_manifest_dict() to raise PyAirbyteInternalError - Updated _extract_docs_from_registry() to raise PyAirbyteInternalError - Updated docstrings to reflect the new exception type - Updated unit tests to expect PyAirbyteInternalError Ref: #863 (comment) Co-Authored-By: AJ Steers <[email protected]>
Address PR feedback from 2519832304: The code in _extract_docs_from_registry() was duplicating the logic from from_manifest_dict(). Refactored to use a shared from_docs_list() classmethod that both functions can call. Changes: - Added ApiDocsUrl.from_docs_list() classmethod with source and context parameters - Refactored from_manifest_dict() to delegate to from_docs_list() - Refactored _extract_docs_from_registry() to use from_docs_list() - Updated docstring to remove 'API' from 'Get API documentation URLs' per CodeRabbit suggestion This makes the code more DRY and easier to maintain. Ref: #863 (comment) Co-Authored-By: AJ Steers <[email protected]>
|
Addressed feedback from r2519730767: Changed exception type from Changes:
All tests pass locally ✅ |
|
Addressed feedback from 2519832304: Eliminated code duplication by adding a shared Changes:
This makes the code more DRY and easier to maintain. All tests pass locally ✅ |
PyTest Results (Full)390 tests +1 374 ✅ +2 24m 41s ⏱️ - 5m 43s Results for commit 8af8085. ± Comparison against base commit 7eb746b. This pull request removes 8 and adds 9 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
This commit fixes a fundamental conflation between manifest.yaml and metadata.yaml files throughout the codebase: - manifest.yaml: Declarative connector manifest (behavior/config) - metadata.yaml: Connector metadata that the registry caches/exposes Changes made: - Renamed _DEFAULT_MANIFEST_URL to _DEFAULT_METADATA_URL - Updated URL pattern from manifest.yaml to metadata.yaml - Renamed _manifest_url_for() to _metadata_url_for() - Renamed _fetch_manifest_dict() to _fetch_metadata_dict() - Renamed ApiDocsUrl.from_manifest_dict() to from_metadata_dict() - Renamed from_docs_list() to from_metadata_docs_list() (made public) - Updated context parameter from 'Manifest' to 'Metadata' - Updated source parameter from 'data_external_docs' to 'metadata_external_docs' - Updated all test classes, methods, and error message expectations - Updated MCP connector_registry.py imports and docstrings The code now correctly parses metadata.yaml files (and registry caches of those metadata.yaml files), not manifest.yaml files. Co-Authored-By: AJ Steers <[email protected]>
|
I've addressed the manifest/metadata conflation issue you identified in #863 (comment). The IssueThe code was conflating two different file types:
Both the registry JSON and metadata.yaml files contain the same metadata structure (including Changes MadeSystematically renamed all "manifest" references to "metadata" throughout the codebase:
Verification✅ All 9 unit tests passing The code now correctly parses metadata.yaml files (and registry caches of those metadata.yaml files), not manifest.yaml files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte/mcp/connector_registry.py(4 hunks)airbyte/registry.py(5 hunks)tests/unit_tests/test_mcp_connector_registry.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte/mcp/connector_registry.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2024-10-08T15:34:31.026Z
Learnt from: aaronsteers
Repo: airbytehq/PyAirbyte PR: 347
File: tests/integration_tests/fixtures/registry.json:48-48
Timestamp: 2024-10-08T15:34:31.026Z
Learning: Test fixtures in the PyAirbyte project do not need to align with real Docker repositories.
Applied to files:
tests/unit_tests/test_mcp_connector_registry.py
📚 Learning: 2024-10-06T23:44:31.534Z
Learnt from: aaronsteers
Repo: airbytehq/PyAirbyte PR: 411
File: airbyte/cli.py:111-160
Timestamp: 2024-10-06T23:44:31.534Z
Learning: In PyAirbyte, error messages in functions like `_resolve_source_job` in `airbyte/cli.py` are designed to decouple the message text from dynamic values, following a structlog-inspired design. Dynamic values are provided via parameters like `input_value`. This approach helps avoid including PII in the message strings, which may be used in telemetry.
Applied to files:
airbyte/registry.py
🧬 Code graph analysis (2)
tests/unit_tests/test_mcp_connector_registry.py (3)
airbyte/mcp/connector_registry.py (1)
get_connector_docs_urls(183-203)airbyte/registry.py (5)
get_connector_docs_urls(459-494)ApiDocsUrl(303-371)_fetch_metadata_dict(389-408)_metadata_url_for(374-386)from_metadata_dict(351-371)airbyte/exceptions.py (2)
PyAirbyteInternalError(190-194)AirbyteConnectorNotRegisteredError(288-297)
airbyte/registry.py (2)
airbyte/exceptions.py (1)
PyAirbyteInternalError(190-194)airbyte/mcp/connector_registry.py (1)
get_connector_docs_urls(183-203)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (10)
tests/unit_tests/test_mcp_connector_registry.py (3)
8-27: LGTM! Clean test updates for the metadata migration.The pytest import and renamed test class correctly reflect the shift from manifest-based to metadata-based API. The test expectations now check for "metadata.yaml" instead of "manifest.yaml", which aligns with the underlying implementation changes.
30-61: LGTM! Comprehensive metadata fetching tests.The tests correctly verify both the 404 case (returns empty dict) and successful parsing of metadata.yaml content.
131-168: LGTM! Excellent validation coverage.These new tests verify that missing required fields ('title' and 'url') correctly raise
PyAirbyteInternalErrorwith descriptive error messages. This addresses the core objective of the PR—preventing unhandled KeyError propagation.airbyte/registry.py (7)
43-45: Note: This PR's scope extends beyond the title suggests.The PR title mentions "Add KeyError validation" but the changes include a complete rename from manifest-based to metadata-based ingestion (renamed constants, functions, and URL paths from manifest.yaml to metadata.yaml). This is fine if intentional, but wanted to flag it since the broader refactoring wasn't mentioned in the PR objectives. The validation additions are well-implemented, though!
315-348: LGTM! Solid error handling with helpful context.The try/except wrapper correctly catches KeyError and converts it to PyAirbyteInternalError with a descriptive message that includes the malformed doc dict. This makes debugging much easier when validation fails.
The
contextparameter is a nice touch—it allows different callers (Metadata vs Registry parsing) to provide specific context in error messages.
350-371: LGTM! Clean separation of concerns.The method correctly extracts documentation URLs from the metadata structure and delegates the validation to
from_metadata_docs_list. The defensiveisinstancechecks ensure graceful handling of missing or malformed metadata sections.
374-386: LGTM! Clean rename with updated docstring.The function correctly references
metadata.yamlin the docstring and uses the renamed_DEFAULT_METADATA_URLconstant.
389-408: LGTM! Appropriate 404 handling.Returning an empty dict on 404 is the right approach here, since not all connectors may have a metadata.yaml file yet. Other HTTP errors are properly raised.
411-456: Good validation reuse, but could we add test coverage for registry errors?The function now correctly uses
from_metadata_docs_listwithcontext="Registry"to ensure consistent error handling. The docstring also documents that it raisesPyAirbyteInternalErrorfor missing fields.However, as noted in the PR objectives, the registry parsing path isn't explicitly tested for error cases. While the underlying
from_metadata_docs_listis tested via the metadata path, it would be valuable to add a test that verifies registry entries with malformedexternalDocumentationUrlsalso raise the appropriate error. wdyt?
459-494: LGTM! Clean aggregation from multiple sources.The function correctly fetches documentation URLs from both the registry and the connector's metadata.yaml file, then combines them. The updated docstring clearly documents both sources. Returning the combined list without deduplication seems intentional—different sources providing the same URL could be useful context for callers.
| def test_connector_not_found(self) -> None: | ||
| """Test handling when connector is not found.""" | ||
| with patch( | ||
| "airbyte.mcp.connector_registry.get_connector_api_docs_urls" | ||
| "airbyte.mcp.connector_registry.get_connector_docs_urls" | ||
| ) as mock_get_docs: | ||
| mock_get_docs.side_effect = exc.AirbyteConnectorNotRegisteredError( | ||
| connector_name="nonexistent-connector", | ||
| context={}, | ||
| ) | ||
|
|
||
| result = get_api_docs_urls("nonexistent-connector") | ||
| result = get_connector_docs_urls("nonexistent-connector") | ||
| assert result == "Connector not found." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Could we verify the patching approach here?
The test patches "airbyte.mcp.connector_registry.get_connector_docs_urls" to raise an exception, but I'm seeing that this is also the function imported at line 11 and called at line 183.
Looking at the MCP wrapper implementation, it seems like this might be testing the wrong layer—the wrapper catches exceptions from the underlying _get_connector_docs_urls call. Should this test patch "airbyte.mcp.connector_registry._get_connector_docs_urls" instead to ensure the exception handling path is actually exercised?
I noticed a past comment mentioned fixing this test, so it might be working as-is, but wanted to double-check the approach. wdyt?
🏁 Script executed:
#!/bin/bash
# Verify the import structure and patching target in the MCP module
# Check how _get_connector_docs_urls is imported in the MCP module
rg -n "import.*_get_connector_docs_urls|from.*import.*_get_connector_docs_urls" --type=py
# Check the implementation of get_connector_docs_urls in the MCP module
ast-grep --pattern $'def get_connector_docs_urls($$$) {
$$$
}'Length of output: 117
---
🏁 Script executed:
```shell
#!/bin/bash
# Find the wrapper function definition in the MCP module
ast-grep --pattern $'def get_connector_docs_urls($_) {
$$$
}' airbyte/mcp/connector_registry.py
# Also check the broader context around line 27 to see the wrapper
rg -n "def get_connector_docs_urls" --type=py -A 15
Length of output: 2220
---
🏁 Script executed:
```shell
#!/bin/bash
# Get the full wrapper implementation to see exception handling
sed -n '183,220p' airbyte/mcp/connector_registry.py
Length of output: 1118
Patch _get_connector_docs_urls instead to properly test the wrapper's exception handling.
The test currently patches the wrapper function itself, which bypasses the exception handling logic inside it. Since the wrapper at line 201 catches AirbyteConnectorNotRegisteredError from _get_connector_docs_urls and returns the fallback string, the test should patch the underlying function (_get_connector_docs_urls imported at line 27) to properly exercise that exception handling path. Otherwise, the test verifies only that the mock was called, not that the wrapper's error handling works.
feat(registry): Add KeyError validation for externalDocumentationUrls
Summary
Adds validation to raise clear
PyAirbyteInputErrorexceptions whenexternalDocumentationUrlsentries are missing requiredtitleorurlfields, instead of allowing KeyError to propagate. This applies to both manifest.yaml parsing and registry metadata parsing.Changes:
ApiDocsUrl.from_manifest_dict()to wrap dict access in try-except and raisePyAirbyteInputErrorwith clear message_extract_docs_from_registry()with same validation patternRelated PRs:
Review & Testing Checklist for Human
This is a yellow risk change (behavior change with potential production impact):
externalDocumentationUrlsmetadata (missing title/url fields). This change will cause those to fail instead of being silently skipped.get_api_docs_urls("source-faker")to ensure it still works correctly with valid metadataNotes
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores