feat(connector builder): allow connector builder to process requests through the concurrent CDK by reworking message routing and order through the queue and message repositories #688

brianjlai · 2025-08-01T05:44:18Z

Problem

Although it is not a substantial lift to swap out the existing ManifestDeclarativeSource with the ConcurrentDeclarativeSource for use by the connector_builder_handler and set the threading to 1.

However, the main issue is that we can't control when we process records from the primary message queue vs processing the next partition of data to extract records. This in turn will lead to incorrect grouping of records because we might insert the next partition of records before we close the previous partition and emit a state message.

Implementation Details

The main changes included in this PR are:

Moving towards a shared ConcurrentMessageRepository which basically wraps the main message queue that records are emitted on
Moving closing the slice from the ConcurrentReadProcessor to the PartitionReader so that we immediately put state messages onto the queue after finishing a partition
Properly applying slice and page limits onto low-code and concurrent components
An interesting bug I noticed was that due to how we serialize the AirbyteStateBlob and the fact that its actually a dynamic field dataclass. It doesn't serialize into a dict properly and always ends up being {}. I unwrapped the object to a dict which now renders the state value

todo: fixing the last few tests

Summary by CodeRabbit

New Features
- Introduced configurable concurrency and resource usage limits for sources, including maximum records, pages per slice, slices, and streams.
- Added support for queue-based message handling to ensure strict message ordering in concurrent operations.
- Enabled passing of catalog and state parameters during source creation for enhanced configurability.
Bug Fixes
- Improved serialization of state messages for compatibility with downstream processing.
Refactor
- Updated source creation and processing to use new concurrency primitives and configuration options.
- Centralized queue management for concurrent message processing.
- Streamlined cursor and partition lifecycle management, removing redundant cursor closure handling.
- Refined cursor slicing behavior with optional slice limiting decorators.
- Extended partition processing to integrate logging and cursor observation.
- Simplified partition completion handling by removing explicit cursor closing and exception handling.
Tests
- Expanded and updated tests to cover new concurrency features, error scenarios, and cursor behaviors.
- Updated test suites to reflect new source types and concurrency parameters.

…K by reworking message routing and order through the queue and message repositories

github-actions · 2025-08-01T05:44:33Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@brian/connector_builder_handler_use_concurrent#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch brian/connector_builder_handler_use_concurrent

Helpful Resources

CDK API Reference

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

github-actions · 2025-08-01T05:49:54Z

PyTest Results (Fast)

3 695 tests ±0 3 684 ✅ ±0 6m 30s ⏱️ +2s
1 suites ±0 11 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 1b3a595. ± Comparison against base commit d2262a5.

This pull request removes 2 and adds 2 tests. Note that renamed tests count towards both.

unit_tests.sources.streams.concurrent.test_concurrent_read_processor.TestConcurrentReadProcessor ‑ test_given_exception_on_partition_complete_sentinel_then_yield_error_trace_message_and_stream_is_incomplete
unit_tests.sources.streams.concurrent.test_partition_reader.PartitionReaderTest ‑ test_given_exception_when_process_partition_then_queue_records_and_exception_and_sentinel

unit_tests.sources.streams.concurrent.test_partition_reader.PartitionReaderTest ‑ test_given_exception_from_close_slice_when_process_partition_then_queue_records_and_exception_and_sentinel
unit_tests.sources.streams.concurrent.test_partition_reader.PartitionReaderTest ‑ test_given_exception_from_read_when_process_partition_then_queue_records_and_exception_and_sentinel

♻️ This comment has been updated with latest results.

github-actions · 2025-08-01T05:51:00Z

PyTest Results (Full)

3 698 tests ±0 3 687 ✅ ±0 11m 58s ⏱️ +16s
1 suites ±0 11 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 1b3a595. ± Comparison against base commit d2262a5.

This pull request removes 2 and adds 2 tests. Note that renamed tests count towards both.

unit_tests.sources.streams.concurrent.test_concurrent_read_processor.TestConcurrentReadProcessor ‑ test_given_exception_on_partition_complete_sentinel_then_yield_error_trace_message_and_stream_is_incomplete
unit_tests.sources.streams.concurrent.test_partition_reader.PartitionReaderTest ‑ test_given_exception_when_process_partition_then_queue_records_and_exception_and_sentinel

unit_tests.sources.streams.concurrent.test_partition_reader.PartitionReaderTest ‑ test_given_exception_from_close_slice_when_process_partition_then_queue_records_and_exception_and_sentinel
unit_tests.sources.streams.concurrent.test_partition_reader.PartitionReaderTest ‑ test_given_exception_from_read_when_process_partition_then_queue_records_and_exception_and_sentinel

♻️ This comment has been updated with latest results.

airbyte_cdk/connector_builder/test_reader/helpers.py

coderabbitai · 2025-08-01T06:20:31Z

📝 Walkthrough

Walkthrough

This change overhauls the concurrency and state management mechanisms for declarative sources in the connector builder. It introduces explicit concurrency limits, replaces ManifestDeclarativeSource with ConcurrentDeclarativeSource, centralizes queue-based message handling, updates test helpers and test cases for the new concurrency model, and refines cursor and partition processing logic throughout the relevant modules.

Changes

Cohort / File(s)	Change Summary
Connector Builder Handler & Main `airbyte_cdk/connector_builder/connector_builder_handler.py`, `airbyte_cdk/connector_builder/main.py`	Refactored to use `ConcurrentDeclarativeSource` with concurrency limits, updated `create_source` signature to accept catalog and state, and adjusted instantiations and imports accordingly.
Declarative Concurrency & Message Handling `airbyte_cdk/sources/declarative/concurrent_declarative_source.py`, `airbyte_cdk/sources/message/concurrent_repository.py`	Introduced `TestLimits` dataclass for concurrency limits, updated `ConcurrentDeclarativeSource` to use a shared queue and a new `ConcurrentMessageRepository` for deterministic message ordering.
Concurrent Source Infrastructure `airbyte_cdk/sources/concurrent_source/concurrent_source.py`, `airbyte_cdk/sources/concurrent_source/concurrent_read_processor.py`, `airbyte_cdk/sources/streams/concurrent/partition_reader.py`, `airbyte_cdk/sources/streams/concurrent/partitions/types.py`	Centralized queue management, updated partition and cursor handling, added `PartitionLogger`, extended `QueueItem` to include `AirbyteMessage`, and refined error and completion handling.
Declarative Model Factory & Stream Slicer Decorator `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py`, `airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py`, `airbyte_cdk/sources/declarative/stream_slicers/declarative_partition_generator.py`	Added slice limits with a new constant, updated cursor creation to optionally wrap with a slice-limiting decorator, and cleaned up unused imports.
Connector Builder Test Helpers `airbyte_cdk/connector_builder/test_reader/helpers.py`	Added a utility to convert `AirbyteStateBlob` to dictionary for serialization, and updated test slice handling to use this utility.
Unit Tests: Declarative/Concurrent `unit_tests/connector_builder/test_connector_builder_handler.py`, `unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py`, `unit_tests/sources/streams/concurrent/scenarios/stream_facade_builder.py`, `unit_tests/sources/streams/concurrent/test_concurrent_read_processor.py`, `unit_tests/sources/streams/concurrent/test_partition_reader.py`	Updated all tests to use `ConcurrentDeclarativeSource` and new concurrency model, adjusted test helpers, mocks, and assertions for queue-based message handling and cursor/partition logic.
Slice Logger Comment Update `airbyte_cdk/sources/utils/slice_logger.py`	Added a comment recommending deprecation of `SliceLogger` once migration to concurrent CDK is complete, noting message ordering issues when using the logger directly.

Sequence Diagram(s)

sequenceDiagram
    participant Handler as connector_builder_handler
    participant Source as ConcurrentDeclarativeSource
    participant Factory as ModelToComponentFactory
    participant Repo as ConcurrentMessageRepository
    participant Queue as Queue
    participant Processor as ConcurrentReadProcessor
    participant PartitionReader as PartitionReader
    participant Cursor as Cursor

    Handler->>Source: create_source(config, limits, catalog, state)
    Source->>Queue: Instantiate shared queue
    Source->>Repo: Instantiate ConcurrentMessageRepository(queue)
    Source->>Factory: Instantiate ModelToComponentFactory(..., message_repository=Repo, ...)
    Source->>Processor: Create ConcurrentSource(queue=Queue, ...)
    Handler->>Source: Start reading
    Source->>Processor: read()
    Processor->>PartitionReader: process_partition(partition, cursor)
    PartitionReader->>Cursor: observe(record)
    PartitionReader->>Queue: Put record
    PartitionReader->>Cursor: close_partition(partition)
    PartitionReader->>Queue: Put PartitionCompleteSentinel
    Processor->>Handler: Yield AirbyteMessages from Queue

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

airbytehq/airbyte-python-cdk#32: Also changes partition closure logic in ConcurrentReadProcessor, which is directly updated in this PR as well.
airbytehq/airbyte-python-cdk#679: Modifies ConcurrentDeclarativeSource and cursor initialization, overlapping with this PR’s concurrency and cursor management changes.
airbytehq/airbyte-python-cdk#228: Enhances ConcurrentDeclarativeSource for async incremental streams, which is a direct code-level connection to this PR’s refactorings.

Suggested reviewers

maxi297
tolik0

Would you like me to break down the test changes more granularly for easier review, or is this summary sufficient for your needs, wdyt?

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 737b22c and 1b3a595.

📒 Files selected for processing (1)

airbyte_cdk/sources/concurrent_source/concurrent_source.py (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

airbyte_cdk/sources/concurrent_source/concurrent_source.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Check: destination-motherduck
GitHub Check: Check: source-intercom
GitHub Check: Check: source-pokeapi
GitHub Check: Check: source-shopify
GitHub Check: Check: source-hardcoded-records
GitHub Check: SDM Docker Image Build
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch brian/connector_builder_handler_use_concurrent

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (8)

unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (1)

75-88: Prefer pytest.skip over commenting-out entire tests

Fully commenting out the memory-usage test hides useful coverage and makes future re-enabling harder.
Would adding @pytest.mark.skip(reason="memory-intensive – disabled for now") (or gating behind an env flag) keep the test visible while excluding it from CI, wdyt?

airbyte_cdk/connector_builder/test_reader/helpers.py (1)

725-743: Helper mutates the input object in-place – intentional?

convert_state_blob_to_mapping rewrites state_message.stream.stream_state directly.
If callers reuse the same AirbyteStateMessage elsewhere, they now receive the mutated variant.
Should we instead deep-copy the message (e.g. via copy.deepcopy) before mutation to avoid accidental side-effects, or is the in-place change guaranteed safe, wdyt?

airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (1)

23-25: Minor: guard against wrapped slicer exhaustion

islice(self.wrapped_slicer.stream_slices(), …) silently stops when the underlying iterator is shorter.
If you need an explicit “truncated” flag for debugging, perhaps wrap with itertools.islice + length check or log a warning.
Probably fine as-is, just raising the idea, wdyt?

airbyte_cdk/sources/streams/concurrent/partitions/types.py (1)

38-44: Docstring drift

The triple-quoted comment above still mentions “ThreadBasedConcurrentStream” but the type is now shared more widely.
Update the wording to avoid confusion, wdyt?

unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py (1)

34-101: Keep dormant tests discoverable

Similar to the extractor test, consider @pytest.mark.skip or xfail instead of commenting out to retain history, IDE discoverability and easy re-activation.
Happy to craft the decorator if useful, wdyt?

unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1)

363-364: Consider cleaning up commented code.

The commented-out queue and message repository initialization suggests exploratory work. Should we remove these lines or are they planned for future use, wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1456-1483: Minor inconsistency in default slice limit.

I noticed that line 1478 uses a hardcoded 5 instead of the MAX_SLICES constant defined above. For consistency, would it be better to use MAX_SLICES here as well, similar to how it's used in create_concurrent_cursor_from_incrementing_count_cursor? wdyt?
-                    maximum_number_of_slices=self._limit_slices_fetched or 5,
+                    maximum_number_of_slices=self._limit_slices_fetched or MAX_SLICES,
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

106-114: Queue initialization with bounded size is a good safety measure.

The maxsize of 10,000 prevents unbounded memory growth. The detailed comment explains the reasoning well. However, since the comment mentions this might need to be configurable, should we add a TODO or make it configurable through TestLimits now to avoid future breaking changes? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2262a5 and bd75d85.

📒 Files selected for processing (18)

airbyte_cdk/connector_builder/connector_builder_handler.py (3 hunks)
airbyte_cdk/connector_builder/main.py (1 hunks)
airbyte_cdk/connector_builder/test_reader/helpers.py (4 hunks)
airbyte_cdk/sources/concurrent_source/concurrent_read_processor.py (2 hunks)
airbyte_cdk/sources/concurrent_source/concurrent_source.py (6 hunks)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (5 hunks)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5 hunks)
airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (1 hunks)
airbyte_cdk/sources/message/concurrent_repository.py (1 hunks)
airbyte_cdk/sources/streams/concurrent/partition_reader.py (3 hunks)
airbyte_cdk/sources/streams/concurrent/partitions/types.py (2 hunks)
unit_tests/connector_builder/test_connector_builder_handler.py (16 hunks)
unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py (1 hunks)
unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py (1 hunks)
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (2 hunks)
unit_tests/sources/streams/concurrent/scenarios/stream_facade_builder.py (1 hunks)
unit_tests/sources/streams/concurrent/test_concurrent_read_processor.py (2 hunks)
unit_tests/sources/streams/concurrent/test_partition_reader.py (3 hunks)

🧰 Additional context used

🧠 Learnings (9)

📓 Common learnings

Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.

📚 Learning: the files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from ...

Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py
unit_tests/sources/streams/concurrent/test_partition_reader.py
unit_tests/connector_builder/test_connector_builder_handler.py
airbyte_cdk/connector_builder/main.py
airbyte_cdk/sources/streams/concurrent/partitions/types.py
airbyte_cdk/sources/concurrent_source/concurrent_source.py
airbyte_cdk/connector_builder/connector_builder_handler.py
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
airbyte_cdk/sources/declarative/concurrent_declarative_source.py
airbyte_cdk/connector_builder/test_reader/helpers.py

📚 Learning: when code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repositor...

Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.

Applied to files:

airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py
unit_tests/connector_builder/test_connector_builder_handler.py
airbyte_cdk/sources/concurrent_source/concurrent_source.py
airbyte_cdk/connector_builder/connector_builder_handler.py
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

📚 Learning: when modifying the `yamldeclarativesource` class in `airbyte_cdk/sources/declarative/yaml_declarativ...

Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py
unit_tests/sources/streams/concurrent/scenarios/stream_facade_builder.py
unit_tests/connector_builder/test_connector_builder_handler.py
airbyte_cdk/connector_builder/main.py
airbyte_cdk/sources/concurrent_source/concurrent_read_processor.py
airbyte_cdk/sources/concurrent_source/concurrent_source.py
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
airbyte_cdk/connector_builder/connector_builder_handler.py
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
airbyte_cdk/sources/declarative/concurrent_declarative_source.py

📚 Learning: in the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-g...

Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.

Applied to files:

airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

📚 Learning: the custompageincrement class in unit_tests/source_declarative_manifest/resources/source_the_guardia...

Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.

Applied to files:

unit_tests/connector_builder/test_connector_builder_handler.py
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
airbyte_cdk/connector_builder/connector_builder_handler.py

📚 Learning: in the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict modu...

Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Applied to files:

airbyte_cdk/connector_builder/connector_builder_handler.py
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

📚 Learning: copying files from `site-packages` in the dockerfile maintains compatibility with both the old file ...

Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#90
File: Dockerfile:16-21
Timestamp: 2024-12-02T18:36:04.346Z
Learning: Copying files from `site-packages` in the Dockerfile maintains compatibility with both the old file structure that manifest-only connectors expect and the new package-based structure where SDM is part of the CDK.

Applied to files:

airbyte_cdk/connector_builder/connector_builder_handler.py

📚 Learning: in the typetransformer class, the data being transformed comes from api responses or source systems,...

Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#221
File: airbyte_cdk/sources/utils/transform.py:0-0
Timestamp: 2025-01-16T00:50:39.069Z
Learning: In the TypeTransformer class, the data being transformed comes from API responses or source systems, so only standard JSON-serializable types are expected. The python_to_json mapping covers all expected types, and it's designed to fail fast (KeyError) on unexpected custom types rather than providing fallbacks.

Applied to files:

airbyte_cdk/connector_builder/test_reader/helpers.py

🧬 Code Graph Analysis (5)

unit_tests/sources/streams/concurrent/test_concurrent_read_processor.py (5)

airbyte_cdk/sources/streams/concurrent/default_stream.py (1)

cursor (101-102)

airbyte_cdk/sources/streams/concurrent/abstract_stream.py (1)

cursor (93-96)

airbyte_cdk/sources/concurrent_source/concurrent_read_processor.py (1)

on_partition (89-105)

airbyte_cdk/sources/concurrent_source/thread_pool_manager.py (1)

submit (45-46)

airbyte_cdk/sources/streams/concurrent/partition_reader.py (1)

process_partition (56-79)

airbyte_cdk/sources/message/concurrent_repository.py (3)

airbyte_cdk/models/airbyte_protocol.py (1)

AirbyteMessage (79-88)

airbyte_cdk/connector_builder/models.py (1)

LogMessage (25-29)

airbyte_cdk/sources/message/repository.py (1)

MessageRepository (45-60)

airbyte_cdk/connector_builder/connector_builder_handler.py (3)

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)

ConcurrentDeclarativeSource (85-549)

TestLimits (71-82)

unit_tests/connector_builder/test_connector_builder_handler.py (1)

manifest_declarative_source (956-957)

airbyte_cdk/models/airbyte_protocol.py (1)

AirbyteStateMessage (67-75)

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

airbyte_cdk/sources/streams/concurrent/cursor.py (1)

ConcurrentCursor (134-502)

airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (1)

StreamSlicerTestReadDecorator (14-28)

airbyte_cdk/connector_builder/test_reader/helpers.py (2)

airbyte_cdk/models/airbyte_protocol.py (2)

AirbyteStateBlob (15-50)

AirbyteStateMessage (67-75)

unit_tests/connector_builder/test_message_grouper.py (1)

state_message (966-974)

🪛 GitHub Actions: Pytest (Fast)

unit_tests/sources/streams/concurrent/test_concurrent_read_processor.py

[error] 167-167: Exception while syncing stream stream_with_custom_requester: Bad request. Please check your request parameters.

[error] 167-167: Exception while syncing stream stream_with_custom_requester: Too many requests.

[error] 219-219: During the sync, the following streams did not sync successfully: stream_with_custom_requester: UserDefinedBackoffException('Too many requests.')

[error] 167-167: Exception while syncing stream stream_with_custom_requester: Invalid URL endpoint: '10.0.27.27' belongs to a private network

[error] 219-219: During the sync, the following streams did not sync successfully: stream_with_custom_requester: AirbyteTracedException("Invalid URL endpoint: '10.0.27.27' belongs to a private network")

[error] 167-167: Exception while syncing stream stream_with_custom_requester: Invalid Protocol Schema: The endpoint that data is being requested from is using an invalid or insecure protocol 'http'. Valid protocol schemes: https

[error] 219-219: During the sync, the following streams did not sync successfully: stream_with_custom_requester: MessageRepresentationAirbyteTracedErrors("'GET' request to 'http://unsecured.protocol/api/v1/v3/marketing/lists?a_param=10&page_size=2' failed with exception: 'Invalid Protocol Scheme: The endpoint that data is being requested from is using an invalid or insecure protocol 'http'. Valid protocol schemes: https'")

[error] 167-167: Exception while syncing stream stream_with_custom_requester: Invalid URL specified or DNS error occurred: The endpoint that data is being requested from is not a valid URL.

[error] 219-219: During the sync, the following streams did not sync successfully: stream_with_custom_requester: UserDefinedBackoffException('Invalid URL specified or DNS error occurred: The endpoint that data is being requested from is not a valid URL.')

[error] 167-167: Exception while syncing stream stream_with_custom_requester: Exception: Token refresh API response was missing access token access_token

[error] 219-219: During the sync, the following streams did not sync successfully: stream_with_custom_requester: Exception('Token refresh API response was missing access token access_token')

[error] 166-166: Encountered an error while checking availability of stream pokemon. Error: This is an intentional failure for testing purposes.

[error] 219-219: During the sync, the following streams did not sync successfully: pokemon: IntentionalException('This is an intentional failure for testing purposes.')

airbyte_cdk/sources/concurrent_source/concurrent_read_processor.py