-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Open
Labels
area/connectorsConnector related issuesConnector related issuesautoteamcommunityneeds-triageteam/extensibilityteam/usetype/bugSomething isn't workingSomething isn't working
Description
Connector Name
source-mongodb-v2
Connector Version
2.0.4
What step the error happened?
None
Relevant information
MongoDB CDC Resume Token Temporal Regression Causing Data Loss
Summary
In our Airbyte the MongoDB source connector is experiencing a critical issue where saved resume tokens have timestamps earlier than initial resume tokens, resulting in missing data during CDC synchronization. Any cause as to why?
Environment
- Airbyte Version: 1.7.1 self hosted community
- Connector: source-mongodb-v2
- MongoDB Version: 2.0.4
Problem Description
Issue 1: Complete Data Loss During Sync Window
Time Period: 9-hour window during peak operation hours
Evidence:
- MongoDB Oplog: 1,623+ operations recorded for affected collection
- Target Database: 0 records synced for the same period
- Specific Missing Records: Documents inserted during the sync window are completely absent from target
- In another connection we had the mongodb source connector repeatedly ingesting 101 records each sync for 2 weeks straight.
Issue 2: Resume Token Temporal Regression
Observed Behavior:
- Initial resume token timestamp:
Recent date/time - Saved resume token timestamp:
Date ~5 days earlier - Result: Saved token regresses to significantly earlier timestamp
Issue 3: Consistent Batch Size Pattern
Pattern: Since mid-October, certain collection syncs consistently return exactly 101 documents
- This matches MongoDB's default first batch size
- When we looked at the logs we saw that the resume token was either:
progressing by mere seconds
outputted resume tokens were from those of oct 23! This suddenly self healed without any intervention and now it is stuck on Oct 28. It is Oct 30th on the day of writing this.
Expected Behavior
- Resume tokens should progress chronologically forward
- All oplog operations should be captured and synced
- Batch operations should complete with
getMorecalls when needed
Actual Behavior
- Resume tokens regress to earlier timestamps
- Data operations are missed entirely during certain time windows
- Syncs appear to "self-heal" but then revert to problematic behavior
Impact
- Data Loss: Critical business data not reaching the data warehouse
- Data Integrity: Inconsistent state between source and target
Workaround
- Full re-sync required for affected collections
- Manual monitoring of sync success rates
- Validation queries comparing source vs target counts
- Increased sync frequency to minimize data loss windows
Additional Context
- Issue appears to be intermittent with some "self-healing" behavior observed
- Problem affects multiple MongoDB collections
- Resume token regression suggests potential race condition or state management issue
- The consistent batch size pattern suggests cursor/batch handling problems
- Issue started appearing around mid-October 2025
Priority: Critical - Data Loss Issue
Labels: bug, mongodb, cdc, data-loss, resume-token, production
Relevant log output
Contribute
- Yes, I want to contribute
Metadata
Metadata
Assignees
Labels
area/connectorsConnector related issuesConnector related issuesautoteamcommunityneeds-triageteam/extensibilityteam/usetype/bugSomething isn't workingSomething isn't working