Skip to content

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Aug 7, 2025

What

Currently if we wants transformations to be passed to the record selector when there is a custom retriever, we need to do this.

How

The ModelToComponentFactory will match function arguments not only on model parameters but also on kwargs being passed. If there is a collision, model parameters will be prioritize to keep backward compatibility

Summary by CodeRabbit

  • New Features

    • Improved support for passing additional parameters when creating nested components, allowing for more flexible configuration in declarative sources.
  • Tests

    • Added a new test to verify stream creation with custom retrievers and transformations.
    • Introduced a testing custom retriever class for enhanced test coverage.

@maxi297 maxi297 requested review from brianjlai and tolik0 August 7, 2025 17:27
@github-actions github-actions bot added the enhancement New feature or request label Aug 7, 2025
Copy link

github-actions bot commented Aug 7, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/pass_transformations_to_custom_retriever_record_selector#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/pass_transformations_to_custom_retriever_record_selector

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copy link
Contributor

coderabbitai bot commented Aug 7, 2025

📝 Walkthrough

Walkthrough

The changes update the ModelToComponentFactory to propagate keyword arguments (**kwargs) when creating nested components, enhancing flexibility in parameter passing. A new test verifies this behavior using a custom retriever and transformation. Additionally, a testing custom retriever class is introduced for use in unit tests.

Changes

Cohort / File(s) Change Summary
Component Factory Keyword Propagation
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Updates the ModelToComponentFactory to forward **kwargs when creating nested components, modifies method signatures, and merges parameters for nested instantiation.
Unit Test for Custom Retriever and Transformations
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
Adds a test that defines a declarative stream with a custom retriever and transformation, ensuring correct instantiation and configuration of nested components with propagated parameters.
Testing Custom Retriever Class
unit_tests/sources/declarative/parsers/testing_components.py
Adds a TestingCustomRetriever class inheriting from SimpleRetriever for use in unit testing custom retrievers within the declarative component factory context.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Case
    participant Factory as ModelToComponentFactory
    participant Retriever as TestingCustomRetriever
    participant Stream as DeclarativeStream

    Test->>Factory: create_custom_component(model, config, **kwargs)
    Factory->>Factory: _create_nested_component(..., **kwargs)
    Factory->>Retriever: Instantiate with merged kwargs
    Factory->>Stream: Instantiate DeclarativeStream with retriever
    Test->>Stream: Assert properties
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • darynaishchenko
  • lazebnyi

Would you like to consider adding more test cases for different types of nested component parameter propagation, or is the current coverage sufficient for your needs? Wdyt?

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f43969e and 446a7ac.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/pass_transformations_to_custom_retriever_record_selector

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
unit_tests/sources/declarative/parsers/testing_components.py (1)

49-51: Simple and effective testing component - nice work!

The TestingCustomRetriever follows the established pattern of other testing components in this file. The minimal implementation with just pass is perfect for testing the factory's kwargs propagation functionality without introducing unnecessary complexity. The dataclass decorator and inheritance from SimpleRetriever align well with the testing needs described in the PR objectives.

One small question though - would it be worth adding the __test__: ClassVar[bool] = False attribute like the other testing components to explicitly tell pytest this isn't a test class, wdyt?

 @dataclass
 class TestingCustomRetriever(SimpleRetriever):
+    __test__: ClassVar[bool] = False  # Tell Pytest this is not a Pytest class, despite its name
+    
     pass
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

949-999: Nice test coverage for the custom retriever transformation feature!

The test effectively validates that transformations are properly passed to custom retrievers. The manifest structure and assertions look good and follow existing patterns.

One thought - would it be worth making the final assertion more specific? Instead of just checking assert stream.retriever.record_selector.transformations, maybe verify the actual transformation type and configuration, wdyt?

Something like:

-    assert stream.retriever.record_selector.transformations
+    assert len(stream.retriever.record_selector.transformations) == 1
+    transformation = stream.retriever.record_selector.transformations[0]
+    assert isinstance(transformation, AddFields)
+    assert transformation.fields[0].path == ["extra"]

This would make the test more robust by ensuring the correct transformation was actually applied.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0a8f30 and c1fde82.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1 hunks)
  • unit_tests/sources/declarative/parsers/testing_components.py (2 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
📚 Learning: 2025-01-13T23:39:15.457Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.

Applied to files:

  • unit_tests/sources/declarative/parsers/testing_components.py
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
📚 Learning: 2025-01-14T00:20:32.310Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Applied to files:

  • unit_tests/sources/declarative/parsers/testing_components.py
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • unit_tests/sources/declarative/parsers/testing_components.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • unit_tests/sources/declarative/parsers/testing_components.py
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
📚 Learning: 2024-11-15T00:59:08.154Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.

Applied to files:

  • unit_tests/sources/declarative/parsers/testing_components.py
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.

Applied to files:

  • unit_tests/sources/declarative/parsers/testing_components.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

[error] 1707-1711: Ruff formatting error: code needs reformatting in this section. Run 'ruff format' to fix.


[error] 1835-1841: Ruff formatting error: code needs reformatting in this section. Run 'ruff format' to fix.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-shopify
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (4)
unit_tests/sources/declarative/parsers/testing_components.py (1)

16-16: Import looks good for the new test component.

The SimpleRetriever import is appropriately placed with the other declarative component imports and supports the new TestingCustomRetriever class below.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

1812-1812: Method signature updated correctly for kwargs propagation

Great addition of **kwargs to the method signature! This enables the keyword argument propagation that the PR aims to achieve.


1844-1844: Perfect use of dictionary union for parameter merging

The use of matching_parameters | matching_kwargs is exactly right for this use case - it ensures model parameters take precedence over kwargs while still allowing the propagation of additional arguments. This maintains backward compatibility while enabling the new functionality.


1710-1710: Formatting issue: Missing comma after kwargs

The pipeline failure indicates a Ruff formatting error. The line needs proper formatting - wdyt about adding a comma after **kwargs?

-                    model, model_field, model_value, config, **kwargs,
+                    model, model_field, model_value, config, **kwargs
⛔ Skipped due to learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Copy link

github-actions bot commented Aug 7, 2025

PyTest Results (Fast)

3 697 tests  +1   3 686 ✅ +1   6m 22s ⏱️ -16s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 446a7ac. ± Comparison against base commit f0a8f30.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Aug 7, 2025

PyTest Results (Full)

3 700 tests  +1   3 689 ✅ +1   11m 42s ⏱️ +7s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 446a7ac. ± Comparison against base commit f0a8f30.

♻️ This comment has been updated with latest results.

@maxi297
Copy link
Contributor Author

maxi297 commented Aug 7, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

@maxi297
Copy link
Contributor Author

maxi297 commented Sep 2, 2025

Should be covered by 25ca5b8

@maxi297 maxi297 closed this Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants