source-dynamics-365-finance-and-operations: yield dicts instead of Pydantic model instances by Alex-Bair · Pull Request #3739 · estuary/connectors

Alex-Bair · 2026-01-10T01:07:33Z

Description:

When capturing documents, source-dynamics-365-finance-and-operations is CPU bound - the connector can't capture documents faster than it's reading rows from CSVs. Yielding Pydantic model instances as documents adds some validation and serialization overhead that's arguably not necessary for source-dynamics-365-finance-and-operations; we're doing very light transformations on a small handful of fields, and all others are left unchanged. Yielding dicts instead avoids some of this overhead and goes through the faster orjson serialization path in the CDK.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on a local stack and with benchmark tests. Confirmed:

py-spy flame graphs show Pydantic validation and serialization no longer takes up a signficant portion of CPU time when capturing documents.
docker stats shows steady state CPU usage dropped from ~80% to ~35% after this change when capturing from a single binding.
documents landed in collections are the same before and after this change (i.e. the same transformations are applied).

…ined at module level

@overload

Add @overload signatures to IncrementalCSVProcessor allowing it to be used without a Pydantic model. When a Pydanitc model is not used, raw dicts are yielded instead with no validation performed. This enables connectors to bypass Pydantic validation overhead for high-throughput CSV processing where the performance cost of validation outweighs its benefits.

…on for documents Pydantic valiation is arguably not necessary for this connector. It primarily processes CSVs and performs very light transformation (convert empty strings to `None`, convert boolean fields to actual boolean values, add in the `_meta.op` field). Yielding validated Pydantic model instances also causes the CDK to serialize documents with model_dump_json(), performing some additional validation. This connector has high CPU usage, and removing some Pydantic validation should help speed up the connector. By performing the light transformation itself and yielding raw dicts, the connector uses the CDK's faster `orjson` serialization pathway when capturing documents, avoiding overhead from Pydantic validation and serialization. This ends up reducing steady state CPU usage from ~80% to ~35% when capturing a single data-heavy binding.

nicolaslazo

LGTM. I really like that overload trick to narrow down those __anext__ signatures

…n incremental_csv_processor.py

Alex-Bair force-pushed the bair/source-d365-f-and-o-use-cpu-improvements-pt2-pr branch from 56434cb to 8d75ad1 Compare January 10, 2026 17:46

Alex-Bair added 3 commits January 10, 2026 16:32

estuary-cdk: move IncrementalCSVProcessor's AsyncByteReader to be def…

1b02115

…ined at module level

Alex-Bair force-pushed the bair/source-d365-f-and-o-use-cpu-improvements-pt2-pr branch from 8d75ad1 to 94571b5 Compare January 10, 2026 21:47

Alex-Bair marked this pull request as ready for review January 10, 2026 21:54

Alex-Bair requested a review from a team January 10, 2026 21:54

nicolaslazo approved these changes Jan 11, 2026

View reviewed changes

Comment thread estuary-cdk/estuary_cdk/incremental_csv_processor.py

Comment thread estuary-cdk/estuary_cdk/incremental_csv_processor.py Outdated

estuary-cdk: replace use of the deprecated Optional with | None i…

e025477

…n incremental_csv_processor.py

Alex-Bair merged commit 0d1663d into main Jan 12, 2026
117 of 121 checks passed

Alex-Bair deleted the bair/source-d365-f-and-o-use-cpu-improvements-pt2-pr branch January 12, 2026 14:44

Alex-Bair mentioned this pull request Jan 17, 2026

source-dynamics-365-finance-and-operations: avoid dynamic Pydantic model construction at runtime #3775

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

source-dynamics-365-finance-and-operations: yield dicts instead of Pydantic model instances#3739

source-dynamics-365-finance-and-operations: yield dicts instead of Pydantic model instances#3739
Alex-Bair merged 4 commits intomainfrom
bair/source-d365-f-and-o-use-cpu-improvements-pt2-pr

Alex-Bair commented Jan 10, 2026 •

edited

Loading

Uh oh!

nicolaslazo left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Alex-Bair commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolaslazo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Alex-Bair commented Jan 10, 2026 •

edited

Loading