Skip to content

feat: checkpoint state based incremental processing in SCOPE similar to Spark Stream Processing#8

Merged
mdrakiburrahman merged 9 commits intomainfrom
dev/mdrrahman/true-microbatch
Apr 7, 2026
Merged

feat: checkpoint state based incremental processing in SCOPE similar to Spark Stream Processing#8
mdrakiburrahman merged 9 commits intomainfrom
dev/mdrrahman/true-microbatch

Conversation

@mdrakiburrahman
Copy link
Copy Markdown
Contributor

@mdrakiburrahman mdrakiburrahman commented Apr 7, 2026

Note

Thank you for making change! Please consider filling this template for your pull request to improve quality of checkin message.

Tip

This repo uses Conventional Commit conventions - please try to rename your PR headline to match it.

Why this change is needed

Moves away from partitioned based watermarking to stateful file based streaming.

How

image image

Test

  • GCI
  • Tested a real model by installing the whl

First run, processes backlog of 66 .ss files:

image

Second run, nothing to do no-op:

image

Blow up the Delta table and checkpoint state - processes 66 again:

image

@mdrakiburrahman mdrakiburrahman linked an issue Apr 7, 2026 that may be closed by this pull request
@mdrakiburrahman mdrakiburrahman changed the title feat: microbatch based files similar to Spark Structured Streaming feat: file state based incremental processing similar to Spark Structured Streaming Apr 7, 2026
@mdrakiburrahman mdrakiburrahman changed the title feat: file state based incremental processing similar to Spark Structured Streaming feat: checkpoint state based incremental processing in SCOPE similar to Spark Structured Streaming Apr 7, 2026
@mdrakiburrahman mdrakiburrahman changed the title feat: checkpoint state based incremental processing in SCOPE similar to Spark Structured Streaming feat: checkpoint state based incremental processing in SCOPE similar to Spark Stream Processing Apr 7, 2026
@mdrakiburrahman mdrakiburrahman marked this pull request as ready for review April 7, 2026 04:21
@mdrakiburrahman mdrakiburrahman merged commit f8bed87 into main Apr 7, 2026
2 checks passed
@mdrakiburrahman mdrakiburrahman deleted the dev/mdrrahman/true-microbatch branch April 7, 2026 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: modifiedTime based streaming and maxFilesPerTrigger

1 participant