Skip to content

improvement: enabling stats on resumed sync #110

@github-actions

Description

@github-actions

Description:

Currently, stats are not available on resumed syncs because the state object does not persist stat-related information for each stream. As a result, we lose visibility into sync progress and estimated time after a restart.

Problem:

  • On resuming a sync, metrics like total records to process and progress percentage are not available
  • This impacts user experience and observability
  • Estimations (e.g., time remaining) cannot be calculated accurately without the initial stats

Proposed Solution:

  • Update the state schema to include stat information for each stream (e.g., total record count, synced count)
  • Store necessary values in state so that they can be reloaded on resume
  • function to check pool.AddRecordsToSync(totalCount) during resume, based on the saved stats

Action Items:

  • Extend state struct to include relevant stats per stream
  • Modify sync/resume flow to read/write stats to state
  • Add tests to validate stat persistence across sync sessions

// TODO: to get estimated time need to update pool.AddRecordsToSync(totalCount) (Can be done via storing some vars in state)
rawChunkArray := chunks.Array()
// convert to premitive.ObjectID
for _, chunk := range rawChunkArray {
premitiveMinID, _ := primitive.ObjectIDFromHex(chunk.Min.(string))
premitiveMaxID, _ := primitive.ObjectIDFromHex(chunk.Max.(string))

Metadata

Metadata

Assignees

Labels

good first issueGood for newcomershacktoberfestIssues open for Hacktoberfest contributorshelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions