-
Notifications
You must be signed in to change notification settings - Fork 126
feat: Enable Stats on Resumed Sync #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
Hi @vaibhav-datazip , hope y are having a great day.... so I'm a bit unsure how to test the e2e resume behaviour locally (I can run the |
Also... I didn't implement the change for the Oracle driver yet....cuz it had a TODO for counting rows/find chunks and i wanted to confirm expected approach before touching Oracle-specific SQL. |
Hi @Itz-Agasta , Also, for this you need to have good amount of rows in your table so that sync takes few minutes to get finished. And just wanted to remind, please raise your pr keeping staging branch as base. |
Sorry, my bad... I’ve updated it. (i should have rebase it tough 😢)
cool i will let you know after testing it. And what about the oracle driver....do you think I’m heading in the right direction? |
No problem, @Itz-Agasta |
Hi @Itz-Agasta , |
By tomorrow, i have implemented it and just have to perform a e2e test and check for any unexpected bhv. |
2025-10-07.19-31-55.mp4Hi @vaibhav-datazip, i have tried the e2e test as y told me. I followed the docs https://olake.io/docs/community/setting-up-a-dev-env and, you can see:
Can y pls check it out now? Thanks |
Hi @Itz-Agasta , maybe my explanation was not that clear.
|
Okkkk I seee ... I get it now i will test it by tommorow and let y know |
@vaibhav-datazip 1 more question ...if i stoped the debugger. How am I suppose to restart the sync using the state file... Any cmds that I can run with the '-- state' flag ? I mean
|
@Itz-Agasta
to further help yourself out , you can use the debugger file mentioned earlier or you can go through this doc as well , which mentions all the flags and commands which can be used in OLake. If you still have some doubt, please feel free to ask. |
Before2025-10-08.20-04-39.mp4Now2025-10-08.20-08-59.mp4 |
Hi @vaibhav-datazip , you can check this out now. One question though.... in both the previous and current cases, when I start the sync with --state, the |
records are written in parquet files in chunks. so if possible, for testing , you can decrease the chunk size to 1000 and try it out . currently the batch size is set such that size of parquet file generated becomes 256 mb , you can find this value in constants.go, if you are unable to set batch size to 1k, try decreasing this size to 1mb and you will see some changes in stats as well |
Yep its resuming form where it left now 2025-10-09.12-53-53.mp4But why with --state, its exceding the total record limit :( |
Description
This PR implements stat persistence in the state object, enabling accurate progress tracking and time estimation when resuming syncs. Previously, when a sync was interrupted and resumed, all progress metrics (total records, synced count, estimated time) were lost, leading to poor user experience and observability.
Closes #110
Type of change
Changes
1. Added New State Tracking (types/state.go)
I added 4 functions that let it save and load these statistics:
SetTotalRecordCount()
- Saves the total number of recordsGetTotalRecordCount()
- Loads the total number of recordsSetSyncedRecordCount()
- Saves how many records are doneGetSyncedRecordCount()
- Loads how many records are done2. Updated Database Drivers
update Mongo, MySQL, and PostgreSQL so it:
3. Updated Resume Logic
Now when soemone resume a sync, the system:
How Has This Been Tested?