-
Notifications
You must be signed in to change notification settings - Fork 37
Elasticsearch Snapshots? #435
Description
Is your feature request related to a problem? Please describe.
Nope. The use case is new, but kind of related to this project - I have an Elasticsearch cluster with large indices that are being snapshotted to S3. I was wondering if I could somehow leverage luceneRDD to load the data directly from S3;
currently, I have Spark heavily query Elasticsearch, which puts a lot of strain on the cluster. Usually I just need a full dump of the data anyways, so I don't need sophisticated ES query capabilities when dumps the data from ES to Spark.
Describe the solution you'd like
Ideally? sparkRDD.fromEs(<es_connection>). Jokes aside - basically, Elasticsearch snapshots are saved as "dumb dumps" of the Lucene index of every shard in the Elasticsearch index. I though we might be able to parse these files luceneRDD.
Describe alternatives you've considered
N/A