Skip to content

source-dynamics-365-finance-and-operations: yield dicts instead of Pydantic model instances#3739

Merged
Alex-Bair merged 4 commits intomainfrom
bair/source-d365-f-and-o-use-cpu-improvements-pt2-pr
Jan 12, 2026
Merged

source-dynamics-365-finance-and-operations: yield dicts instead of Pydantic model instances#3739
Alex-Bair merged 4 commits intomainfrom
bair/source-d365-f-and-o-use-cpu-improvements-pt2-pr

Conversation

@Alex-Bair
Copy link
Copy Markdown
Member

@Alex-Bair Alex-Bair commented Jan 10, 2026

Description:

When capturing documents, source-dynamics-365-finance-and-operations is CPU bound - the connector can't capture documents faster than it's reading rows from CSVs. Yielding Pydantic model instances as documents adds some validation and serialization overhead that's arguably not necessary for source-dynamics-365-finance-and-operations; we're doing very light transformations on a small handful of fields, and all others are left unchanged. Yielding dicts instead avoids some of this overhead and goes through the faster orjson serialization path in the CDK.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on a local stack and with benchmark tests. Confirmed:

  • py-spy flame graphs show Pydantic validation and serialization no longer takes up a signficant portion of CPU time when capturing documents.
  • docker stats shows steady state CPU usage dropped from ~80% to ~35% after this change when capturing from a single binding.
  • documents landed in collections are the same before and after this change (i.e. the same transformations are applied).

@Alex-Bair Alex-Bair force-pushed the bair/source-d365-f-and-o-use-cpu-improvements-pt2-pr branch from 56434cb to 8d75ad1 Compare January 10, 2026 17:46
Add @overload signatures to IncrementalCSVProcessor allowing it to be used
without a Pydantic model. When a Pydanitc model is not used, raw dicts
are yielded instead with no validation performed.

This enables connectors to bypass Pydantic validation overhead for
high-throughput CSV processing where the performance cost of validation
outweighs its benefits.
…on for documents

Pydantic valiation is arguably not necessary for this connector. It
primarily processes CSVs and performs very light transformation (convert
empty strings to `None`, convert boolean fields to actual boolean values,
add in the `_meta.op` field). Yielding validated Pydantic model instances
also causes the CDK to serialize documents with model_dump_json(),
performing some additional validation.

This connector has high CPU usage, and removing some Pydantic validation
should help speed up the connector. By performing the light transformation
itself and yielding raw dicts, the connector uses the CDK's faster `orjson`
serialization pathway when capturing documents, avoiding overhead from
Pydantic validation and serialization. This ends up reducing
steady state CPU usage from ~80% to ~35% when capturing a single
data-heavy binding.
@Alex-Bair Alex-Bair force-pushed the bair/source-d365-f-and-o-use-cpu-improvements-pt2-pr branch from 8d75ad1 to 94571b5 Compare January 10, 2026 21:47
@Alex-Bair Alex-Bair marked this pull request as ready for review January 10, 2026 21:54
@Alex-Bair Alex-Bair requested a review from a team January 10, 2026 21:54
Copy link
Copy Markdown
Contributor

@nicolaslazo nicolaslazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I really like that overload trick to narrow down those __anext__ signatures

Comment thread estuary-cdk/estuary_cdk/incremental_csv_processor.py
Comment thread estuary-cdk/estuary_cdk/incremental_csv_processor.py Outdated
@Alex-Bair Alex-Bair merged commit 0d1663d into main Jan 12, 2026
117 of 121 checks passed
@Alex-Bair Alex-Bair deleted the bair/source-d365-f-and-o-use-cpu-improvements-pt2-pr branch January 12, 2026 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants