-
Notifications
You must be signed in to change notification settings - Fork 30
feat(dynamic-streams): Avoid parsing strings in ConfigComponentResolver #769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dynamic-streams): Avoid parsing strings in ConfigComponentResolver #769
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@tolik0/dynamic-streams/avoid-parsing-strings-in-config-components-resolver#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch tolik0/dynamic-streams/avoid-parsing-strings-in-config-components-resolver Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the ConfigComponentResolver by avoiding unnecessary YAML parsing for string values that are intended to remain as strings.
- Adds conditional logic to skip YAML parsing when the value is already a string and the expected type is exclusively string
- Preserves the original string value instead of attempting to parse it as YAML
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThe resolver now skips YAML/JSON parsing when a resolved value is a string and the expected value_type is exactly (str,); otherwise it still attempts parsing via _parse_yaml_if_possible. Unit tests were added to verify string-vs-parsed behavior for several mappings. No public APIs changed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant Resolver as ConfigComponentsResolver
participant YAML as _parse_yaml_if_possible
Caller->>Resolver: resolve_components(config, components)
loop each mapping
Resolver->>Resolver: evaluate path and resolved_value
alt resolved_value is string AND value_type == (str,)
note right of Resolver #D9EDF7: use string as-is (no parsing)
Resolver->>Resolver: parsedValue = resolved_value (str)
else
note right of Resolver #F7F0D9: attempt YAML/JSON parsing
Resolver->>YAML: _parse_yaml_if_possible(resolved_value)
YAML-->>Resolver: parsedValue (possibly transformed)
end
Resolver->>Resolver: dpath.set(config, path, parsedValue)
end
Resolver-->>Caller: updated config
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Would you like me to add an explicit unit test case for an edge string like "null" with value_type Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py (3)
180-184
: Make the skip-parse predicate more robust and readableComparing to
valid_types == (str,)
is brittle and ties this logic to howvalid_types
is constructed. Would you switch to checking the declared type directly and invert the branches for clarity, wdyt?- # Avoid parsing strings that are meant to be strings - if not (isinstance(value, str) and valid_types == (str,)): - parsed_value = self._parse_yaml_if_possible(value) - else: - parsed_value = value + # Avoid parsing strings that are meant to be strings + if isinstance(value, str) and resolved_component.value_type is str: + parsed_value = value + else: + parsed_value = self._parse_yaml_if_possible(value)
187-188
: Don’t drop valid “falsy” values on create_or_updateThe current guard skips creating values like
""
,0
, orFalse
. If that’s unintended, can we gate only onNone
, wdyt?- if parsed_value and not updated and resolved_component.create_or_update: + if parsed_value is not None and not updated and resolved_component.create_or_update:
172-184
: valid_types shape confirmed — consistent across codebaseInterpolatedString/JinjaInterpolation.eval accepts valid_types: Optional[Tuple[Type[Any]]]; callers construct valid_types as (value_type,) or None (seen in add_fields, http/parametrized/config_components_resolver and ModelToComponentFactory mapping). The current check (isinstance(value, str) and valid_types == (str,)) matches that pattern and is safe. Optional: make the string-check more future-proof (e.g. valid_types is not None and str in valid_types) to allow multi-type tuples — wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Applied to files:
airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-pokeapi
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Analyze (python)
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (2)
392-437
: Great coverage; consider adding ambiguous YAML tokens and whitespace cases to harden behavior.Would you add a couple more mappings to cover YAML 1.1 boolean-like tokens and trailing whitespace so we don’t accidentally parse them in the future, wdyt?
- “on” should remain a string when value_type="string" (YAML 1.1 treats it as boolean).
- “true ” (note the trailing space) should remain a string when value_type="string".
You could extend the mapping like this:
@@ { "type": "ComponentMappingDefinition", "field_path": ["retriever", "requester", "$parameters", "date_yaml_parsed"], "value": "2024-07-10", # no value_type -> YAML should parse to datetime.date "create_or_update": True, }, + { + "type": "ComponentMappingDefinition", + "field_path": ["retriever", "requester", "$parameters", "ambiguous_on_string"], + "value": "on", # YAML 1.1 boolean-like token; keep as string + "value_type": "string", + "create_or_update": True, + }, + { + "type": "ComponentMappingDefinition", + "field_path": ["retriever", "requester", "$parameters", "trailing_space_string"], + "value": "true ", # trailing space; keep as-is string + "value_type": "string", + "create_or_update": True, + },If you go this route, we can assert these in the test (see next comment). Wdyt?
440-479
: Stabilize access to private attributes and tighten the date assertion.Two small tweaks:
- Use a helper to fetch requester parameters to reduce brittleness if internals move.
- Compare dates directly instead of via isoformat.
Apply:
@@ - for stream in source.streams(_CONFIG): - params = ( - stream._stream_partition_generator._partition_factory._retriever.requester._parameters - ) + for stream in source.streams(_CONFIG): + params = get_requester_params(stream) @@ - assert "date_yaml_parsed" in params - assert isinstance(params["date_yaml_parsed"], dt_date) - assert params["date_yaml_parsed"].isoformat() == "2024-07-10" + assert "date_yaml_parsed" in params + assert params["date_yaml_parsed"] == dt_date(2024, 7, 10)And add this helper near the top of the file (outside this hunk):
def get_requester_params(stream): try: return ( stream ._stream_partition_generator ._partition_factory ._retriever .requester ._parameters ) except AttributeError as e: raise AssertionError("Requester parameters are not accessible; test may need an update.") from eIf you add the extra mappings from the previous comment, would you also assert them here?
assert "ambiguous_on_string" in params and params["ambiguous_on_string"] == "on" assert "trailing_space_string" in params and params["trailing_space_string"] == "true "Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/resolvers/test_config_components_resolver.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (1)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)
ConcurrentDeclarativeSource
(128-652)streams
(388-420)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (2)
7-7
: LGTM on thedt_date
alias.Clear and avoids name shadowing.
390-391
: No action needed for whitespace-only change.
Tested locally with google-sheets and pinterest |
…-in-config-components-resolver
Summary by CodeRabbit
Bug Fixes
Refactor
Tests