[source-mongodb-v2] Saved Resume Token Time Regression Resulting in Missing Data

### Connector Name

source-mongodb-v2

### Connector Version

2.0.4

### What step the error happened?

None

### Relevant information

# MongoDB CDC Resume Token Temporal Regression Causing Data Loss

## Summary
In our Airbyte the MongoDB source connector is experiencing a critical issue where saved resume tokens have timestamps earlier than initial resume tokens, resulting in missing data during CDC synchronization. Any cause as to why?

## Environment
- **Airbyte Version**: 1.7.1 self hosted community
- **Connector**: source-mongodb-v2
- **MongoDB Version**: 2.0.4

## Problem Description

### Issue 1: Complete Data Loss During Sync Window
**Time Period**: 9-hour window during peak operation hours

**Evidence**:
- **MongoDB Oplog**: 1,623+ operations recorded for affected collection
- **Target Database**: 0 records synced for the same period
- **Specific Missing Records**: Documents inserted during the sync window are completely absent from target
- In another connection we had the mongodb source connector repeatedly ingesting 101 records each sync for 2 weeks straight. 

### Issue 2: Resume Token Temporal Regression
**Observed Behavior**: 
- Initial resume token timestamp: `Recent date/time`
- Saved resume token timestamp: `Date ~5 days earlier`
- **Result**: Saved token regresses to significantly earlier timestamp

### Issue 3: Consistent Batch Size Pattern
**Pattern**: Since mid-October, certain collection syncs consistently return exactly 101 documents
- This matches MongoDB's default first batch size 
- When we looked at the logs we saw that the resume token was either:
progressing by mere seconds
outputted resume tokens were from those of oct 23! This suddenly self healed without any intervention and now it is stuck on Oct 28. It is Oct 30th on the day of writing this.


## Expected Behavior
1. Resume tokens should progress chronologically forward
2. All oplog operations should be captured and synced
3. Batch operations should complete with `getMore` calls when needed

## Actual Behavior
1. Resume tokens regress to earlier timestamps
2. Data operations are missed entirely during certain time windows
3. Syncs appear to "self-heal" but then revert to problematic behavior

## Impact
- **Data Loss**: Critical business data not reaching the data warehouse
- **Data Integrity**: Inconsistent state between source and target


## Workaround
- Full re-sync required for affected collections
- Manual monitoring of sync success rates
- Validation queries comparing source vs target counts
- Increased sync frequency to minimize data loss windows

## Additional Context
- Issue appears to be intermittent with some "self-healing" behavior observed
- Problem affects multiple MongoDB collections
- Resume token regression suggests potential race condition or state management issue
- The consistent batch size pattern suggests cursor/batch handling problems
- Issue started appearing around mid-October 2025


---

**Priority**: Critical - Data Loss Issue
**Labels**: bug, mongodb, cdc, data-loss, resume-token, production

### Relevant log output

```shell

```

### Contribute

- [ ] Yes, I want to contribute

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[source-mongodb-v2] Saved Resume Token Time Regression Resulting in Missing Data #69102

Connector Name

Connector Version

What step the error happened?

Relevant information

MongoDB CDC Resume Token Temporal Regression Causing Data Loss

Summary

Environment

Problem Description

Issue 1: Complete Data Loss During Sync Window

Issue 2: Resume Token Temporal Regression

Issue 3: Consistent Batch Size Pattern

Expected Behavior

Actual Behavior

Impact

Workaround

Additional Context

Relevant log output

Contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[source-mongodb-v2] Saved Resume Token Time Regression Resulting in Missing Data #69102

Description

Connector Name

Connector Version

What step the error happened?

Relevant information

MongoDB CDC Resume Token Temporal Regression Causing Data Loss

Summary

Environment

Problem Description

Issue 1: Complete Data Loss During Sync Window

Issue 2: Resume Token Temporal Regression

Issue 3: Consistent Batch Size Pattern

Expected Behavior

Actual Behavior

Impact

Workaround

Additional Context

Relevant log output

Contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions