-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Delta Snapshot Reindex with RFS
RFC Proposal
What / Why
We propose Delta Snapshot Reindex (DSR), an enhancement to Reindex‑from‑Snapshot (RFS) that applies only the changes between two snapshots to a target cluster, rather than full reingestion. This reduces unnecessary I/O, network, and compute overhead when updating historical data already backfilled via RFS or Snapshot/Restore (see OpenSearch Migration Assistant docs: https://opensearch.org/docs/latest/migration-assistant/).
Who Wants This?
- Operators maintaining standby clusters who need periodic, incremental updates.
- Data-lake integrators: managing cold or archived OpenSearch clusters who periodically snapshot data for long-term storage and need to apply only incremental changes (deltas) to downstream systems or refreshed clusters, avoiding full reprocessing.
- Migration Operators: optimizing backfill resource costs while reducing lag between source and target clusters.
Problem Statement
Updating a target index currently requires deleting it and re‑ingesting all documents to guarantee consistency. This:
- Wastes network/CPU on unchanged docs.
- Causes write amplification on the target cluster.
- Causes a high RTO for the migration without other solutions such as Capture and Replay
Proposed Solution
- Inputs:
- Old snapshot name (already applied to target).
- New snapshot name (later/earlier point‑in‑time).
- Per‑shard, per‑segment diff:
- Parse Lucene segment files from both snapshots.
- Compare segment names and live‑docs bitsets.
- Delta determination:
Segment Scenario | Action |
---|---|
In old but not in new | Delete all live docs from that segment. |
In new but not in old | Add all live docs from that segment. |
In both snapshots | Compare bitsets: - Old ∖ New → delete - New ∖ Old → add |
- Apply deltas:
- First execute all deletes, then all adds/updates.
- Merge‑optimization (Future enhancement):
- Track a hashmap of
_id
→ hash(_source
); if a delete and add target the same_id
with identical hash, skip both.
- Track a hashmap of
Limitations & Drawbacks
- Segment merges can trigger full‑segment deletes/adds unless optimized via the hashmap.
_source:false
indices is not compatible (same as RFS).- Plugin classes: custom plugins may need to be on the DSR application classpath.
Alternatives Considered
-
Log‑based CDC: Invasive change‑stream setup. Limited compatibility across older versions.
-
Reindex REST API: Burdens source cluster; no delete detection.
-
Incremental Snapshot/Restore: Unsupported beyond one version.
-
Native Delta Reindex in Snapshot/Restore API:
Embedding delta logic directly into theSnapshot/Restore
mechanism (rather than building around Reindex-from-Snapshot) could offer a cleaner, lower-latency path for applying snapshot diffs. This would allow:- Repository-level segment diffing and blob reuse.
- Elimination of reindex overhead and custom coordination logic.
- Tighter integration with existing DR workflows.
Drawbacks: Requires deeper changes to core restore flows and compatibility handling across OpenSearch/Lucene versions. Slower to implement and validate across all repository types.
Feedback Requested
- Suggestions for efficiently processing segment merges
- Special data types needing extra care (e.g. nested docs, binary fields, geo‑shapes, custom analyzers, etc.).