Skip to content

[source-mongodb-v2] Saved Resume Token Time Regression Resulting in Missing Data #69102

@RScicomp

Description

@RScicomp

Connector Name

source-mongodb-v2

Connector Version

2.0.4

What step the error happened?

None

Relevant information

MongoDB CDC Resume Token Temporal Regression Causing Data Loss

Summary

In our Airbyte the MongoDB source connector is experiencing a critical issue where saved resume tokens have timestamps earlier than initial resume tokens, resulting in missing data during CDC synchronization. Any cause as to why?

Environment

  • Airbyte Version: 1.7.1 self hosted community
  • Connector: source-mongodb-v2
  • MongoDB Version: 2.0.4

Problem Description

Issue 1: Complete Data Loss During Sync Window

Time Period: 9-hour window during peak operation hours

Evidence:

  • MongoDB Oplog: 1,623+ operations recorded for affected collection
  • Target Database: 0 records synced for the same period
  • Specific Missing Records: Documents inserted during the sync window are completely absent from target
  • In another connection we had the mongodb source connector repeatedly ingesting 101 records each sync for 2 weeks straight.

Issue 2: Resume Token Temporal Regression

Observed Behavior:

  • Initial resume token timestamp: Recent date/time
  • Saved resume token timestamp: Date ~5 days earlier
  • Result: Saved token regresses to significantly earlier timestamp

Issue 3: Consistent Batch Size Pattern

Pattern: Since mid-October, certain collection syncs consistently return exactly 101 documents

  • This matches MongoDB's default first batch size
  • When we looked at the logs we saw that the resume token was either:
    progressing by mere seconds
    outputted resume tokens were from those of oct 23! This suddenly self healed without any intervention and now it is stuck on Oct 28. It is Oct 30th on the day of writing this.

Expected Behavior

  1. Resume tokens should progress chronologically forward
  2. All oplog operations should be captured and synced
  3. Batch operations should complete with getMore calls when needed

Actual Behavior

  1. Resume tokens regress to earlier timestamps
  2. Data operations are missed entirely during certain time windows
  3. Syncs appear to "self-heal" but then revert to problematic behavior

Impact

  • Data Loss: Critical business data not reaching the data warehouse
  • Data Integrity: Inconsistent state between source and target

Workaround

  • Full re-sync required for affected collections
  • Manual monitoring of sync success rates
  • Validation queries comparing source vs target counts
  • Increased sync frequency to minimize data loss windows

Additional Context

  • Issue appears to be intermittent with some "self-healing" behavior observed
  • Problem affects multiple MongoDB collections
  • Resume token regression suggests potential race condition or state management issue
  • The consistent batch size pattern suggests cursor/batch handling problems
  • Issue started appearing around mid-October 2025

Priority: Critical - Data Loss Issue
Labels: bug, mongodb, cdc, data-loss, resume-token, production

Relevant log output

Contribute

  • Yes, I want to contribute

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions