Skip to content

Conversation

tolik0
Copy link
Contributor

@tolik0 tolik0 commented Sep 22, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Configuration values explicitly expected as strings are now preserved as plain text and no longer subject to YAML/JSON interpretation (prevents “yes”/“no” → booleans and similar coercions).
  • Refactor

    • YAML/JSON parsing is now only applied when a value is not explicitly typed as string, narrowing parsing to intended cases without changing external behavior.
  • Tests

    • Added tests covering string vs parsed-value scenarios (including JSON and date cases) to ensure parsing respects declared value types.

@Copilot Copilot AI review requested due to automatic review settings September 22, 2025 16:29
@tolik0 tolik0 self-assigned this Sep 22, 2025
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@tolik0/dynamic-streams/avoid-parsing-strings-in-config-components-resolver#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch tolik0/dynamic-streams/avoid-parsing-strings-in-config-components-resolver

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@github-actions github-actions bot added the enhancement New feature or request label Sep 22, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the ConfigComponentResolver by avoiding unnecessary YAML parsing for string values that are intended to remain as strings.

  • Adds conditional logic to skip YAML parsing when the value is already a string and the expected type is exclusively string
  • Preserves the original string value instead of attempting to parse it as YAML

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@tolik0 tolik0 requested a review from pnilan September 22, 2025 16:29
Copy link
Contributor

coderabbitai bot commented Sep 22, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

The resolver now skips YAML/JSON parsing when a resolved value is a string and the expected value_type is exactly (str,); otherwise it still attempts parsing via _parse_yaml_if_possible. Unit tests were added to verify string-vs-parsed behavior for several mappings. No public APIs changed.

Changes

Cohort / File(s) Summary
Declarative config resolver logic
airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py
Modified resolve_components to bypass YAML/JSON parsing when the resolved value is a str and value_type is exactly (str,); otherwise continues to call _parse_yaml_if_possible. Path evaluation, dpath.set updates, and create/update behavior unchanged; no signature changes.
Unit tests for value_type parsing
unit_tests/sources/declarative/resolvers/test_config_components_resolver.py
Added test_value_type_str_avoids_yaml_parsing and extended manifest mappings (as_string, as_yaml, json_string, json_parsed, date_as_string, date_yaml_parsed) to assert that values with value_type=(str,) remain raw strings while others are YAML/JSON parsed (including date parsing). Added from datetime import date as dt_date import.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Resolver as ConfigComponentsResolver
  participant YAML as _parse_yaml_if_possible

  Caller->>Resolver: resolve_components(config, components)
  loop each mapping
    Resolver->>Resolver: evaluate path and resolved_value
    alt resolved_value is string AND value_type == (str,)
      note right of Resolver #D9EDF7: use string as-is (no parsing)
      Resolver->>Resolver: parsedValue = resolved_value (str)
    else
      note right of Resolver #F7F0D9: attempt YAML/JSON parsing
      Resolver->>YAML: _parse_yaml_if_possible(resolved_value)
      YAML-->>Resolver: parsedValue (possibly transformed)
    end
    Resolver->>Resolver: dpath.set(config, path, parsedValue)
  end
  Resolver-->>Caller: updated config
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • bazarnov

Would you like me to add an explicit unit test case for an edge string like "null" with value_type (str,) to ensure it remains the literal string rather than being parsed to None, wdyt?

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately describes the primary change — avoiding YAML/JSON parsing of strings in ConfigComponentResolver — and uses a concise conventional-commit style with an appropriate scope, so it reflects the changes in the diff and tests.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tolik0/dynamic-streams/avoid-parsing-strings-in-config-components-resolver

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py (3)

180-184: Make the skip-parse predicate more robust and readable

Comparing to valid_types == (str,) is brittle and ties this logic to how valid_types is constructed. Would you switch to checking the declared type directly and invert the branches for clarity, wdyt?

-                # Avoid parsing strings that are meant to be strings
-                if not (isinstance(value, str) and valid_types == (str,)):
-                    parsed_value = self._parse_yaml_if_possible(value)
-                else:
-                    parsed_value = value
+                # Avoid parsing strings that are meant to be strings
+                if isinstance(value, str) and resolved_component.value_type is str:
+                    parsed_value = value
+                else:
+                    parsed_value = self._parse_yaml_if_possible(value)

187-188: Don’t drop valid “falsy” values on create_or_update

The current guard skips creating values like "", 0, or False. If that’s unintended, can we gate only on None, wdyt?

-                if parsed_value and not updated and resolved_component.create_or_update:
+                if parsed_value is not None and not updated and resolved_component.create_or_update:

172-184: valid_types shape confirmed — consistent across codebase

InterpolatedString/JinjaInterpolation.eval accepts valid_types: Optional[Tuple[Type[Any]]]; callers construct valid_types as (value_type,) or None (seen in add_fields, http/parametrized/config_components_resolver and ModelToComponentFactory mapping). The current check (isinstance(value, str) and valid_types == (str,)) matches that pattern and is safe. Optional: make the string-check more future-proof (e.g. valid_types is not None and str in valid_types) to allow multi-type tuples — wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c67c556 and 679b3ca.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • airbyte_cdk/sources/declarative/resolvers/config_components_resolver.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Analyze (python)

Copy link

github-actions bot commented Sep 22, 2025

PyTest Results (Fast)

3 758 tests  +1   3 746 ✅ +1   6m 21s ⏱️ +2s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 11d4931. ± Comparison against base commit a1428bf.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Sep 22, 2025

PyTest Results (Full)

3 761 tests  +1   3 749 ✅ +2   11m 8s ⏱️ +10s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌  - 1 

Results for commit 11d4931. ± Comparison against base commit a1428bf.

♻️ This comment has been updated with latest results.

@tolik0
Copy link
Contributor Author

tolik0 commented Sep 23, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (2)

392-437: Great coverage; consider adding ambiguous YAML tokens and whitespace cases to harden behavior.

Would you add a couple more mappings to cover YAML 1.1 boolean-like tokens and trailing whitespace so we don’t accidentally parse them in the future, wdyt?

  • “on” should remain a string when value_type="string" (YAML 1.1 treats it as boolean).
  • “true ” (note the trailing space) should remain a string when value_type="string".

You could extend the mapping like this:

@@
         {
             "type": "ComponentMappingDefinition",
             "field_path": ["retriever", "requester", "$parameters", "date_yaml_parsed"],
             "value": "2024-07-10",  # no value_type -> YAML should parse to datetime.date
             "create_or_update": True,
         },
+        {
+            "type": "ComponentMappingDefinition",
+            "field_path": ["retriever", "requester", "$parameters", "ambiguous_on_string"],
+            "value": "on",  # YAML 1.1 boolean-like token; keep as string
+            "value_type": "string",
+            "create_or_update": True,
+        },
+        {
+            "type": "ComponentMappingDefinition",
+            "field_path": ["retriever", "requester", "$parameters", "trailing_space_string"],
+            "value": "true ",  # trailing space; keep as-is string
+            "value_type": "string",
+            "create_or_update": True,
+        },

If you go this route, we can assert these in the test (see next comment). Wdyt?


440-479: Stabilize access to private attributes and tighten the date assertion.

Two small tweaks:

  • Use a helper to fetch requester parameters to reduce brittleness if internals move.
  • Compare dates directly instead of via isoformat.

Apply:

@@
-    for stream in source.streams(_CONFIG):
-        params = (
-            stream._stream_partition_generator._partition_factory._retriever.requester._parameters
-        )
+    for stream in source.streams(_CONFIG):
+        params = get_requester_params(stream)
@@
-        assert "date_yaml_parsed" in params
-        assert isinstance(params["date_yaml_parsed"], dt_date)
-        assert params["date_yaml_parsed"].isoformat() == "2024-07-10"
+        assert "date_yaml_parsed" in params
+        assert params["date_yaml_parsed"] == dt_date(2024, 7, 10)

And add this helper near the top of the file (outside this hunk):

def get_requester_params(stream):
    try:
        return (
            stream
            ._stream_partition_generator
            ._partition_factory
            ._retriever
            .requester
            ._parameters
        )
    except AttributeError as e:
        raise AssertionError("Requester parameters are not accessible; test may need an update.") from e

If you add the extra mappings from the previous comment, would you also assert them here?

assert "ambiguous_on_string" in params and params["ambiguous_on_string"] == "on"
assert "trailing_space_string" in params and params["trailing_space_string"] == "true "

Wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9472e09 and 96d55c5.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (1)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)
  • ConcurrentDeclarativeSource (128-652)
  • streams (388-420)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
unit_tests/sources/declarative/resolvers/test_config_components_resolver.py (2)

7-7: LGTM on the dt_date alias.

Clear and avoids name shadowing.


390-391: No action needed for whitespace-only change.

@tolik0
Copy link
Contributor Author

tolik0 commented Sep 23, 2025

Tested locally with google-sheets and pinterest

@tolik0 tolik0 merged commit 25132b6 into main Sep 29, 2025
32 of 33 checks passed
@tolik0 tolik0 deleted the tolik0/dynamic-streams/avoid-parsing-strings-in-config-components-resolver branch September 29, 2025 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants