SubstreamPartitionRouter does not propagate the substream's incremental cursor to the parent's re-execution — is this expected?
#77728
-
|
CDK version: declarative-manifest 7.18.1 (Airbyte Cloud) SetupI have a two-stream declarative source:
The API has no pagination — Observed behaviorThe
In all cases, the substream fetches details for every historical sale on every sync. For my use case (~100 tenants, each with hundreds of sales), this means thousands of unnecessary API calls per sync run. QuestionIs this the intended behavior? If so, is there a supported pattern for making the parent's re-execution within |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
Hi Andreas Skielboe (@askielboe), thank you for the very detailed write-up — this is exactly the right question to ask, and there's a supported way to wire it up. Short answerWhat you're observing is the default behavior: when the parent stream is re-executed inside Examplesales_details:
type: DeclarativeStream
retriever:
type: SimpleRetriever
requester:
path: "/api/sales/{{ stream_slice.sale_id }}"
# ...
partition_router:
type: SubstreamPartitionRouter
parent_stream_configs:
- stream: "#/definitions/streams/sales"
parent_key: id
partition_field: sale_id
incremental_dependency: true # <-- the key flag
incremental_sync:
type: DatetimeBasedCursor
cursor_field: started
# ... (no start_time_option needed since /api/sales/{id} has no `since`)With
Important prerequisiteThis pattern only behaves correctly if the parent record's cursor field advances whenever any of its children change. From the docs:
In your case, both If you don't want to track parent stateIf using
References
If you give Need more help? Join the Airbyte Community Slack for peer support, or if you're a Cloud customer, open a support ticket referencing this discussion URL. |
Beta Was this translation helpful? Give feedback.
Hi Andreas Skielboe (@askielboe), thank you for the very detailed write-up — this is exactly the right question to ask, and there's a supported way to wire it up.
Short answer
What you're observing is the default behavior: when the parent stream is re-executed inside
SubstreamPartitionRouterto generate partitions, by default it does not carry over any persisted cursor state, so it starts fromconfig.start_dateevery sync. To make the parent re-execution use the persisted parent-cursor state across syncs, setincremental_dependency: trueon the parent stream config inside the substream's partition router.Example