feat: Enable Stats on Resumed Sync #558

Itz-Agasta · 2025-10-02T18:46:27Z

Description

This PR implements stat persistence in the state object, enabling accurate progress tracking and time estimation when resuming syncs. Previously, when a sync was interrupted and resumed, all progress metrics (total records, synced count, estimated time) were lost, leading to poor user experience and observability.

Closes #110

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Changes

1. Added New State Tracking (types/state.go)

I added 4 functions that let it save and load these statistics:

SetTotalRecordCount() - Saves the total number of records
GetTotalRecordCount() - Loads the total number of records
SetSyncedRecordCount() - Saves how many records are done
GetSyncedRecordCount() - Loads how many records are done

2. Updated Database Drivers

update Mongo, MySQL, and PostgreSQL so it:

Calculates the total records at the start
Saves this number to the state file
Updates the "synced so far" number after each batch completes

3. Updated Resume Logic

Now when soemone resume a sync, the system:

Checks if it's a resumed sync (by looking for existing chunks in state)
Loads the saved statistics
Restores them so progress tracking works correctly
Continues from where it left off with accurate numbers

How Has This Been Tested?

# Build the entire project
go build ./...
# Success

…storation

…ckfill processes

CLAassistant · 2025-10-02T18:46:34Z

All committers have signed the CLA.

Itz-Agasta · 2025-10-02T19:02:01Z

Hi @vaibhav-datazip , hope y are having a great day....

so I'm a bit unsure how to test the e2e resume behaviour locally (I can run the types unit tests, but I don't yet have a clear integration test procedure that simulates a driver + destination pause/resume). Could you advise something aout it

Itz-Agasta · 2025-10-02T19:09:48Z

Also... I didn't implement the change for the Oracle driver yet....cuz it had a TODO for counting rows/find chunks and i wanted to confirm expected approach before touching Oracle-specific SQL.

vaibhav-datazip · 2025-10-04T17:33:29Z

Hi @Itz-Agasta ,
you can run the e2e testing by using any of your db, icebergs local-test or parquet destination and setting up debugging environment, once the sync has started you can see that stats are getting updated , now from debugger you can stop this sync and run the sync again but this time with state, as the chunks which haven't been synced are still saved there.

Also, for this you need to have good amount of rows in your table so that sync takes few minutes to get finished.

And just wanted to remind, please raise your pr keeping staging branch as base.

Itz-Agasta · 2025-10-04T17:50:41Z

And just wanted to remind, please raise your pr keeping staging branch as base.

Sorry, my bad... I’ve updated it. (i should have rebase it tough 😢)

you can run the e2e testing by using any of your db, icebergs local-test or parquet destination and setting up debugging environment, once the sync has started you can see that stats are getting updated , now from debugger you can stop this sync and run the sync again but this time with state, as the chunks which haven't been synced are still saved there.
Also, for this you need to have good amount of rows in your table so that sync takes few minutes to get finished.

cool i will let you know after testing it.

And what about the oracle driver....do you think I’m heading in the right direction?

vaibhav-datazip · 2025-10-04T20:39:14Z

No problem, @Itz-Agasta
Do let me know once your changes are done, I’ll start the review then.
As for Oracle stats are not being saved, that issue is already being worked on by someone else, so you can add your fix for the other drivers for now.

vaibhav-datazip · 2025-10-06T10:35:57Z

Hi @Itz-Agasta ,
Just checking in, when can I expect this PR to be ready for review from your side ?

Itz-Agasta · 2025-10-06T10:56:11Z

Hi @Itz-Agasta , Just checking in, when can I expect this PR to be ready for review from your side ?

By tomorrow, i have implemented it and just have to perform a e2e test and check for any unexpected bhv.

Itz-Agasta · 2025-10-07T14:12:44Z

Hi @Itz-Agasta , you can run the e2e testing by using any of your db, icebergs local-test or parquet destination and setting up debugging environment, once the sync has started you can see that stats are getting updated , now from debugger you can stop this sync and run the sync again but this time with state, as the chunks which haven't been synced are still saved there.

2025-10-07.19-31-55.mp4

Hi @vaibhav-datazip, i have tried the e2e test as y told me. I followed the docs https://olake.io/docs/community/setting-up-a-dev-env and, you can see:

The sync starts normally and updates stats as expected.
I paused the sync using the debugger after ~8k records.
On resuming with the saved state, the sync continued from ~8k and updated stats correctly

Can y pls check it out now? Thanks

vaibhav-datazip · 2025-10-07T18:11:06Z

Hi @Itz-Agasta , maybe my explanation was not that clear.

By stopping the sync I mean to stop it completely not pausing it (that leftmost red square).
By doing this we will be replicating a scenario where the sync stops abruptly and the user will run it again with state file which gets generated.
Also, you can replace the file mentioned under click to view .vscode/launch.json on this page. It has all the drivers and mode preset so you can directly select and run those modes, if your files are present.

Itz-Agasta · 2025-10-07T18:17:21Z

By stopping the sync I mean to stop it completely not pausing it (that leftmost red square).

Okkkk I seee ... I get it now i will test it by tommorow and let y know

Itz-Agasta · 2025-10-07T18:24:22Z

@vaibhav-datazip 1 more question ...if i stoped the debugger. How am I suppose to restart the sync using the state file... Any cmds that I can run with the '-- state' flag ?

I mean

I start the debugger
I stopped the debugger completely
Lets say the state file has synced upto ~8k rows
Now using that state file how I am I supposed to continue the sync ? As at this point if I start the debugger again it will start from 0 probably.
Any cmds that I can use passing the state file as a flag or argument?

vaibhav-datazip · 2025-10-07T18:36:28Z

@Itz-Agasta
The state file is generated when the chunks are created or the bookmark is being noted (in case of cdc or incremental sync).

This process takes place before the chunk iteration is done (the process where records are actually written in the destination).
So your state files has chunks data, put a break point in chunk iterator function and your program will stop, go to your state file and check there.
you will be able to see chunks min and max values,
Once you stop the sync you will be able to start the sync again with leftover chunks, and --state flag can be used for this.

to further help yourself out , you can use the debugger file mentioned earlier or you can go through this doc as well , which mentions all the flags and commands which can be used in OLake.

If you still have some doubt, please feel free to ask.

Itz-Agasta · 2025-10-08T16:19:53Z

Before

2025-10-08.20-04-39.mp4

Now

2025-10-08.20-08-59.mp4

@vaibhav-datazip

Itz-Agasta · 2025-10-08T16:20:52Z

Before

2025-10-08.20-04-39.mp4

Now

2025-10-08.20-08-59.mp4

@vaibhav-datazip

Hi @vaibhav-datazip , you can check this out now.

One question though.... in both the previous and current cases, when I start the sync with --state, the synced records in the stats.json (right side) resets to 0. Shouldn’t it continue from where it left off? For eg, if the last sync reached 8k, shouldn’t it resume from 8k instead of restarting? Or am I missing something?

vaibhav-datazip · 2025-10-08T16:30:10Z

records are written in parquet files in chunks. so if possible, for testing , you can decrease the chunk size to 1000 and try it out .

currently the batch size is set such that size of parquet file generated becomes 256 mb , you can find this value in constants.go, if you are unable to set batch size to 1k, try decreasing this size to 1mb and you will see some changes in stats as well

Itz-Agasta · 2025-10-09T07:52:36Z

currently the batch size is set such that size of parquet file generated becomes 256 mb , you can find this value in constants.go, if you are unable to set batch size to 1k, try decreasing this size to 1mb and you will see some changes in stats as well

Yep its resuming form where it left now

2025-10-09.12-53-53.mp4

But why with --state, its exceding the total record limit :(

vaibhav-datazip · 2025-10-14T12:48:23Z

So, the stats of total records synced in stats file is representing records in batch of 10k, These are yet to be committed. once they will be committed the user can see them in their destination.

But if before this whole chunk (the parent unit of batch) fails to sync, whole chunk will be synced again . That's the reason why you are seeing records exceeding total number of records.

An easy way to solve this is, don't update the state file based on this total records synced from stats, instead update that only when one whole chunk gets committed. In that way we will have correct numbers when the sync is resumed again

vaibhav-datazip · 2025-10-19T08:04:00Z

@Itz-Agasta can you check my previous comment and do the necessary changes

…nd drivers/postgres/internal/backfill.go

Itz-Agasta · 2025-10-27T07:11:09Z

@Itz-Agasta there are some merge conflicts , can you resolve them ?

Yep.. i have resolved them,,, you can review it now :)

drivers/mysql/internal/backfill.go

drivers/postgres/internal/backfill.go

drivers/abstract/backfill.go

Itz-Agasta · 2025-10-27T10:32:56Z

@vaibhav-datazip i have refactor the pr as y asked.... i think its good to go now .. pls check it.

drivers/abstract/backfill.go

destination/writers.go

Itz-Agasta · 2025-10-28T08:36:21Z

@vaibhav-datazip done.. y can check it now

vaibhav-datazip · 2025-10-28T11:00:32Z

@Itz-Agasta I tried testing this using postgres , ran the sync by changing chunk size and stopped it. When trying to re run sync with state I am unable to see estimated remaining time. I have attached video for your reference.

Screen.Recording.2025-10-28.at.4.06.10.PM.mov

Also, instead of using 2 variables total records and synced records, you can just remaining records variable. Because that is used to check remaining time.

Itz-Agasta · 2025-10-29T17:55:33Z

Also, instead of using 2 variables total records and synced records, you can just remaining records variable. Because that is used to check remaining time.

ok you mean,,,,,instead of storing total_record_count and synced_record_count, just we will store remaining_record_count and update it as we go. mean-

Fresh sync: remaining = total
After chunk completes: remaining -= chunk_records
Resume: just load remaining and add to pool
right?

Itz-Agasta · 2025-10-29T20:24:34Z

Hi @vaibhav-datazip , I implemented your suggestion....So, Instead of tracking total_record_count and synced_record_count separately, I switched to just tracking remaining_record_count.

I added three methods:

SetRemainingRecordCount() - saves the count when starting a fresh sync
GetRemainingRecordCount() - loads it when resuming
DecrementRemainingRecordCount() - reduces it after each chunk completes

Here is the final result-

2025-10-30.01-41-03.mp4

You can test it now ...

Itz-Agasta · 2025-10-29T20:28:16Z

protocol/sync.go

 		}

+		connector.SetupState(state)
+


After implementing the prv changes, I tested it and... still got "Not Determined" :(

So I started debugging. I added log statements everywhere to trace what was happening. That's when I discovered the timing issue in sync.go (probably):

The stats logger was starting BEFORE the remaining records were loaded from state

Ig-

Logger starts -> checks pool stats -> finds TotalRecordsToSync = 0 -> shows "Not Determined"

Then backfill loads state and adds records to pool stats

But logger already decided there's no data

Thats why I added pre-load logic in sync.go that runs BEFORE starting the logger. It iterates through all streams (FullLoad, CDC, Incremental), loads their remaining counts from state, and adds them to pool stats.

Itz-Agasta · 2025-10-29T20:35:41Z

types/state.go

+			}
+			if countFloat64, ok := count.(float64); ok {
+				return int64(countFloat64)
+			}


This one took me a while to figure out. I added more logging and noticed something weird - even though the state file had "remaining_record_count": 377888, when I read it back, it was coming out as 0!

I traced through the code and found the issue in GetRemainingRecordCount():

if count, loaded := s.Streams[index].State.Load(RemainingRecordCountKey); loaded { if countInt64, ok := count.(int64); ok { return countInt64 // This was failing! } } return 0

Ig Go's JSON unmarshaling converts ALL numbers to float64 by default, not their original types....So when we saved remaining_record_count as int64, it came back from JSON as float64. The type assertion count.(int64) was failing silently, and we were returning 0.

vaibhav-datazip · 2025-10-30T10:46:53Z

@Itz-Agasta , I tried testing it again. when I resumed the sync after stopping it, the remaining records count didn't reduce , as you can see in the video attached.
can you do proper testing from your side before next review request.

Screen.Recording.2025-10-30.at.4.11.31.PM.mov

protocol/sync.go

types/state.go

…64 and float64 types

Itz-Agasta · 2025-11-01T07:41:26Z

@Itz-Agasta , I tried testing it again. when I resumed the sync after stopping it, the remaining records count didn't reduce , as you can see in the video attached. can you do proper testing from your side before next review request.
Screen.Recording.2025-10-30.at.4.11.31.PM.mov

Ahhhh. it was the same JSON type mismatch bug, but in DecrementRemainingRecordCount() same issue....When state loads from JSON, remaining_record_count becomes float64. The old code only checked for int64 only.... so the type assertion failed silently and never decremented the count.... i fixed it.. you can check now

Itz-Agasta · 2025-11-02T09:51:24Z

@vaibhav-datazip please check this e2e test... i hope its perfect now.

2025-11-01.19-58-59.mp4

vaibhav-datazip · 2025-11-05T06:41:24Z

@Itz-Agasta , will be testing and reviewing soon

Itz-Agasta added 3 commits October 3, 2025 00:00

feat: add total and synced record count management for streams

cdb961b

enhance backfill process to support resumed sync with record count re…

776fade

…storation

feat: persist total record count to state for resume capability in ba…

6398bf9

…ckfill processes

Itz-Agasta changed the base branch from master to staging October 4, 2025 17:47

Merge branch 'staging' into improve

b79f70e

vaibhav-datazip added the hacktoberfest Issues open for Hacktoberfest contributors label Oct 7, 2025

Merge branch 'staging' into improve

3b50a5d

Merge branch 'staging' into improve

335571a

vaibhav-datazip and others added 2 commits October 15, 2025 12:49

Merge branch 'staging' into improve

bf1ac3a

Merge branch 'staging' into improve

7961d2b

Merge branch 'datazip-inc:master' into improve

3332a36

Merge upstream/staging: resolve conflicts in destination/writers.go a…

ebdb1ae

…nd drivers/postgres/internal/backfill.go

vaibhav-datazip reviewed Oct 27, 2025

View reviewed changes

drivers/mysql/internal/backfill.go Outdated Show resolved Hide resolved

drivers/postgres/internal/backfill.go Outdated Show resolved Hide resolved

fix fmts

a27fdfd

ImDoubD-datazip reviewed Oct 27, 2025

View reviewed changes

drivers/abstract/backfill.go Outdated Show resolved Hide resolved

drivers/abstract/backfill.go Outdated Show resolved Hide resolved

drivers/abstract/backfill.go Outdated Show resolved Hide resolved

drivers/abstract/backfill.go Outdated Show resolved Hide resolved

refactor: Improve the backfill process

e7a2dbc

vaibhav-datazip reviewed Oct 27, 2025

View reviewed changes

drivers/abstract/backfill.go Outdated Show resolved Hide resolved

refactor: improved backfill logging and remove unused variable

724d045

vaibhav-datazip reviewed Oct 28, 2025

View reviewed changes

drivers/abstract/backfill.go Outdated Show resolved Hide resolved

destination/writers.go Outdated Show resolved Hide resolved

refactor: remove unnecessary blank lines

3c225f4

Itz-Agasta added 3 commits October 30, 2025 00:03

Add new methods for remaining record count with dual type handling

d89c722

refactor: update record count handling in backfill process

e74771b

Add pre-load logic that loads remaining count

bb814a6

Itz-Agasta commented Oct 29, 2025

View reviewed changes

Merge branch 'staging' into improve

521caa2

vaibhav-datazip reviewed Oct 30, 2025

View reviewed changes

protocol/sync.go Show resolved Hide resolved

types/state.go Outdated Show resolved Hide resolved

vaibhav-datazip and others added 3 commits October 31, 2025 16:31

Merge branch 'staging' into improve

99699b5

refactor: enhance remaining record count handling to support both int…

68c35fa

…64 and float64 types

refactor: rename remaining record count key

ba93364

Merge branch 'staging' into improve

87751c1

feat: Enable Stats on Resumed Sync #558

Are you sure you want to change the base?

feat: Enable Stats on Resumed Sync #558

Conversation

Itz-Agasta commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

1. Added New State Tracking (types/state.go)

2. Updated Database Drivers

3. Updated Resume Logic

How Has This Been Tested?

Uh oh!

CLAassistant commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Itz-Agasta commented Oct 2, 2025

Uh oh!

Itz-Agasta commented Oct 2, 2025

Uh oh!

vaibhav-datazip commented Oct 4, 2025

Uh oh!

Itz-Agasta commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vaibhav-datazip commented Oct 4, 2025

Uh oh!

vaibhav-datazip commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Itz-Agasta commented Oct 6, 2025

Uh oh!

Itz-Agasta commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vaibhav-datazip commented Oct 7, 2025

Uh oh!

Itz-Agasta commented Oct 7, 2025

Uh oh!

Itz-Agasta commented Oct 7, 2025

Uh oh!

vaibhav-datazip commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Itz-Agasta commented Oct 8, 2025

Before

Now

Uh oh!

Itz-Agasta commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

Now

Uh oh!

vaibhav-datazip commented Oct 8, 2025

Uh oh!

Itz-Agasta commented Oct 9, 2025

Uh oh!

vaibhav-datazip commented Oct 14, 2025

Uh oh!

vaibhav-datazip commented Oct 19, 2025

Uh oh!

Itz-Agasta commented Oct 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Itz-Agasta commented Oct 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Itz-Agasta commented Oct 28, 2025

Uh oh!

vaibhav-datazip commented Oct 28, 2025

Uh oh!

Itz-Agasta commented Oct 29, 2025

Itz-Agasta commented Oct 2, 2025 •

edited

Loading

CLAassistant commented Oct 2, 2025 •

edited

Loading

Itz-Agasta commented Oct 4, 2025 •

edited

Loading

vaibhav-datazip commented Oct 6, 2025 •

edited

Loading

Itz-Agasta commented Oct 7, 2025 •

edited

Loading

vaibhav-datazip commented Oct 7, 2025 •

edited

Loading

Itz-Agasta commented Oct 8, 2025 •

edited

Loading

Itz-Agasta commented Oct 29, 2025 •

edited

Loading

Itz-Agasta Oct 29, 2025 •

edited

Loading

Itz-Agasta Oct 29, 2025 •

edited

Loading

Itz-Agasta commented Nov 1, 2025 •

edited

Loading