Skip to content

[RFC] Delta Snapshot Reindex with RFS #1517

@AndreKurait

Description

@AndreKurait

Delta Snapshot Reindex with RFS

RFC Proposal


What / Why

We propose Delta Snapshot Reindex (DSR), an enhancement to Reindex‑from‑Snapshot (RFS) that applies only the changes between two snapshots to a target cluster, rather than full reingestion. This reduces unnecessary I/O, network, and compute overhead when updating historical data already backfilled via RFS or Snapshot/Restore (see OpenSearch Migration Assistant docs: https://opensearch.org/docs/latest/migration-assistant/).


Who Wants This?

  • Operators maintaining standby clusters who need periodic, incremental updates.
  • Data-lake integrators: managing cold or archived OpenSearch clusters who periodically snapshot data for long-term storage and need to apply only incremental changes (deltas) to downstream systems or refreshed clusters, avoiding full reprocessing.
  • Migration Operators: optimizing backfill resource costs while reducing lag between source and target clusters.

Problem Statement

Updating a target index currently requires deleting it and re‑ingesting all documents to guarantee consistency. This:

  1. Wastes network/CPU on unchanged docs.
  2. Causes write amplification on the target cluster.
  3. Causes a high RTO for the migration without other solutions such as Capture and Replay

Proposed Solution

  1. Inputs:
    • Old snapshot name (already applied to target).
    • New snapshot name (later/earlier point‑in‑time).
  2. Per‑shard, per‑segment diff:
    • Parse Lucene segment files from both snapshots.
    • Compare segment names and live‑docs bitsets.
  3. Delta determination:
Segment Scenario Action
In old but not in new Delete all live docs from that segment.
In new but not in old Add all live docs from that segment.
In both snapshots Compare bitsets:
- Old ∖ New → delete
- New ∖ Old → add
  1. Apply deltas:
    • First execute all deletes, then all adds/updates.
  2. Merge‑optimization (Future enhancement):
    • Track a hashmap of _id → hash(_source); if a delete and add target the same _id with identical hash, skip both.

Limitations & Drawbacks

  • Segment merges can trigger full‑segment deletes/adds unless optimized via the hashmap.
  • _source:false indices is not compatible (same as RFS).
  • Plugin classes: custom plugins may need to be on the DSR application classpath.

Alternatives Considered

  1. Log‑based CDC: Invasive change‑stream setup. Limited compatibility across older versions.

  2. Reindex REST API: Burdens source cluster; no delete detection.

  3. Incremental Snapshot/Restore: Unsupported beyond one version.

  4. Native Delta Reindex in Snapshot/Restore API:
    Embedding delta logic directly into the Snapshot/Restore mechanism (rather than building around Reindex-from-Snapshot) could offer a cleaner, lower-latency path for applying snapshot diffs. This would allow:

    • Repository-level segment diffing and blob reuse.
    • Elimination of reindex overhead and custom coordination logic.
    • Tighter integration with existing DR workflows.

    Drawbacks: Requires deeper changes to core restore flows and compatibility handling across OpenSearch/Lucene versions. Slower to implement and validate across all repository types.


Feedback Requested

  • Suggestions for efficiently processing segment merges
  • Special data types needing extra care (e.g. nested docs, binary fields, geo‑shapes, custom analyzers, etc.).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions