-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Describe the bug
This bug is only applicable if versioning is not enabled and we rewind back in time. Though versioning is recommended, it is still possible to ingest without versions today.
If versioning is not used in pull-based ingestion and the streaming source pointer is rewinded back in time, it is possible to skip the latest available message for a document at that time.
- Assume we have multiple versions of a document present in the stream, and all of them are persisted without versions.
- If the consumer is explicitly rewinded back in time, to reprocess all the messages from step 1, messages determined to be duplicates (previously processed) are skipped.
- The issue is the poller is only aware of latest offsets for a given document. This results in processing older messages while skipping the latest message for a given document.
More details will follow.
The fix: Remove persisted pointer concept and rely on versioning to ensure consistent view of docs on rewind. Pull-based ingestion will provide atleast once processing guarantee when versioning is not used.
Related component
Indexing
To Reproduce
Create pull-based index without versioning. Have multiple updates for a document. Rewind to early offset and ensure there is no new version of documents published after rewind. We will skip the latest available message known to the shard.
Expected behavior
Reflect latest version of a document that is seen without skipping valid messages even if versioning is not used.
Additional Details
Plugins
ingestion-kafka, ingestion-kinesis