Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(import): simpler import + state restore logic #625

Merged
merged 8 commits into from
Mar 26, 2025

Conversation

floryst
Copy link
Collaborator

@floryst floryst commented Jul 16, 2024

This commit contains two changes that probably can be split up into two separate PRs that can be applied to the base branch. I'm putting this up as a draft for now so @PaulHax can get a sense of the changes.

The pipeline code has been removed in favor of a simpler chain-of-responsibility approach, using evaluateChain and asyncSelect.

evaluateChain is responsible for evaluating a data source against a chain of import handlers until one of them returns a new data source.

To keep processing a data source like how the old pipeline code supported nested executions, evaluateChain is invoked inside a loop for every data source. asyncSelect is used to drive the loop execution, seleting evaluateChain promises whenever they are done.

The state schema is updated to generically operate on serialized data sources. Instead of special-casing for remote files, the serialized DataSource type encodes this state.

@floryst floryst force-pushed the simplified-state-load-restore branch from 1849857 to 382c45e Compare July 30, 2024 18:30
@floryst floryst force-pushed the streaming-base branch 2 times, most recently from c7c1a32 to 19e8af5 Compare August 8, 2024 00:41
@floryst floryst force-pushed the simplified-state-load-restore branch from 382c45e to f59f77d Compare August 8, 2024 00:44
@floryst floryst force-pushed the streaming-base branch 2 times, most recently from f854153 to 3c79e69 Compare March 19, 2025 17:42
floryst and others added 7 commits March 26, 2025 16:33
The pipeline code has been removed in favor of a simpler
chain-of-responsibility approach, using `evaluateChain` and
`asyncSelect`.

`evaluateChain` is responsible for evaluating a data source against a
chain of import handlers until one of them returns a new data source.

To keep processing a data source like how the old pipeline code
supported nested executions, `evaluateChain` is invoked inside a loop
for every data source. `asyncSelect` is used to drive the loop
execution, seleting `evaluateChain` promises whenever they are done.

The state schema is updated to generically operate on serialized data
sources. Instead of special-casing for remote files, the serialized
DataSource type encodes this state.
DataSource objects now represent a single type, rather than the
bag-of-types done originally.
@floryst floryst force-pushed the simplified-state-load-restore branch from f59f77d to ef98d95 Compare March 26, 2025 20:36
feat(DataSource): simplified data structure
@floryst floryst marked this pull request as ready for review March 26, 2025 20:40
@floryst
Copy link
Collaborator Author

floryst commented Mar 26, 2025

@PaulHax FYI I'm merging these changes into streaming-base and working off of that.

@floryst floryst merged commit 9ee9781 into streaming-base Mar 26, 2025
0 of 3 checks passed
@floryst floryst deleted the simplified-state-load-restore branch March 26, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants