Skip to content

Elasticsearch Snapshots? #435

@matan129

Description

@matan129

Is your feature request related to a problem? Please describe.
Nope. The use case is new, but kind of related to this project - I have an Elasticsearch cluster with large indices that are being snapshotted to S3. I was wondering if I could somehow leverage luceneRDD to load the data directly from S3;
currently, I have Spark heavily query Elasticsearch, which puts a lot of strain on the cluster. Usually I just need a full dump of the data anyways, so I don't need sophisticated ES query capabilities when dumps the data from ES to Spark.

Describe the solution you'd like
Ideally? sparkRDD.fromEs(<es_connection>). Jokes aside - basically, Elasticsearch snapshots are saved as "dumb dumps" of the Lucene index of every shard in the Elasticsearch index. I though we might be able to parse these files luceneRDD.

Describe alternatives you've considered
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions