Refactored snapshot handling to make it easier to add new Lucene sources #1479

gregschohn · 2025-04-28T15:04:32Z

Description

WIP - I've refactored snapshot handling throughout shard setup and document retrieval to make it easier to add new Lucene document sources.

I'm also in the process of adding new parameters -

There are parameters for the coordination cluster, of the same type as what we use for source and target, so that the target cluster doesn't need to be used for coordination. A wholly independent cluster can be setup just for coordination. There's a patch on the argo branch to support flexible index names too, so one could imagine running many deployments with the same coordination index.
snapshotLoader* is in place, though only partially implemented, to allow alternate ways to get shard info and create document streams. I intend to load the specified class in via ServiceLoader, similar to how Transformations work today.
initialSubdivision hasn't been implemented yet, but I think it would be pretty easy and super useful, especially for the case where users want to reshard an index. Rather than use a single worker at a time for a source shard, this would pull the shard down to find the number of documents, then split the work up into that many shards so that the migration could be completed by that factor faster (since there wouldn't be idling target shards). This should be easy to implement by adding an end document index so that we can split a shard in parallel. It would also be easy to detect the first one since the initial work item would have a -1 for its end index.

Everything else was just shuffling old code around into constructors and the newly created ElasticsearchSnapshotDocumentRepository, which wraps up the S3-centric objects and metadata that RFS had since been using beforehand. The EncryptedShardBucketCollectionDocumentRepository also has stub code that won't do anything. Since we don't have a spec for reading encrypted files yet, this will be hived off, to be provided via a hot-loaded jar at some point in the future.

Issues Resolved

Testing

Check List

New functionality includes testing
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…cument retrieval to make it easier to add new Lucene document sources. I'm also in the process of adding new parameters - 1. There are parameters for the coordination cluster, of the same type as what we use for source and target, so that the target cluster doesn't need to be used for coordination. A wholly independent cluster can be setup just for coordination. There's a patch on the argo branch to support flexible index names too, so one could imagine running many deployments with the same coordination index. 2. snapshotLoader* is in place, though only partially implemented, to allow alternate ways to get shard info and create document streams. I intend to load the specified class in via ServiceLoader, similar to how Transformations work today. 3. initialSubdivision hasn't been implemented yet, but I think it would be pretty easy and super useful, especially for the case where users want to reshard an index. Rather than use a single worker at a time for a source shard, this would pull the shard down to find the number of documents, then split the work up into that many shards so that the migration could be completed by that factor faster (since there wouldn't be idling target shards). This should be easy to implement by adding an end document index so that we can split a shard in parallel. It would also be easy to detect the first one since the initial work item would have a -1 for its end index. Everything else was just shuffling old code around into constructors and the newly created ElasticsearchSnapshotDocumentRepository, which wraps up the S3-centric objects and metadata that RFS had since been using beforehand. The EncryptedShardBucketCollectionDocumentRepository also has stub code that won't do anything. Since we don't have a spec for reading encrypted files yet, this will be hived off, to be provided via a hot-loaded jar at some point in the future. Signed-off-by: Greg Schohn <[email protected]>

peternied

High level, this looks good, thanks for cleaning up some of the snapshot reading interface.

peternied · 2025-04-28T16:49:31Z

DocumentsFromSnapshotMigration/src/main/java/org/opensearch/migrations/RfsMigrateDocuments.java

-        @Parameter(required = true,
+        @Parameter(required = false,
+            names = {"--snapshotLoader"},
+            description = "Hot-loaded class that should be used to download the lucene directory of a shard")


'Hot loading' is a bit of a misnomer, terrified me a little bit when first looking at the code as it implies a class loader, glad to see that isn't the case. IMO is a good candidate for a future expansion as need.

peternied · 2025-04-28T16:49:58Z

DocumentsFromSnapshotMigration/src/main/java/org/opensearch/migrations/RfsMigrateDocuments.java


+        @Parameter(required = false,
+            names = { "--initial-subdivision", "--initialSubdivision" },
+            converter = VersionConverter.class,


That doesn't seem correct

peternied · 2025-04-28T16:52:50Z

...rc/main/java/org/opensearch/migrations/bulkload/ElasticsearchSnapshotDocumentRepository.java

+import org.opensearch.migrations.cluster.ClusterProviderRegistry;
+import org.opensearch.migrations.cluster.ClusterSnapshotReader;
+
+public class ElasticsearchSnapshotDocumentRepository extends LuceneBasedDocumentRepository {


Nit: We seem to be using SearchCluster for ES or OS

Suggested change

public class ElasticsearchSnapshotDocumentRepository extends LuceneBasedDocumentRepository {

public class SearchClusterSnapshotDocumentRepository extends LuceneBasedDocumentRepository {

peternied · 2025-04-28T16:55:32Z

...ava/org/opensearch/migrations/bulkload/EncryptedShardBucketCollectionDocumentRepository.java

+
+    @Override
+    public LuceneIndexReader getReader(String index, int shard) {
+        return new IndexReader9(downloadManager.downloadToLocalPath(index, shard), false, null);


Before merging lets rationalize this with the version matrix used in LuceneIndexReader

peternied · 2025-04-28T16:59:15Z

RFS/src/main/java/org/opensearch/migrations/bulkload/common/http/ConnectionContext.java

    }

+    @Getter
+    public static class CoordinatorArgs implements ConnectionContext.IParams {


This looks identical to the existing target args, what is the use case for this over the target?

…o a process-oriented one. In this case, a command and some config parameter is passed to the Process.start(). As the code is, that command will be called with different conventions, with different command arguments, as well as additional arguments to that command (ie. indexName or indexName and shard) to fulfill the 4 different operations that the LuceneBasedDocumentRepository needs. I've not built out a program that fulfills this contract yet, so an end-to-end test through this is still pending. Signed-off-by: Greg Schohn <[email protected]>

peternied · 2025-05-01T16:18:05Z

RFS/src/main/java/org/opensearch/migrations/bulkload/ManuallyManagedLuceneShardRepository.java

+import lombok.extern.slf4j.Slf4j;
+
+@Slf4j
+public class ManuallyManagedLuceneShardRepository extends LuceneBasedDocumentRepository {


I'm not sure what to make of this class, it seems like it belongs in its own top level directory as it seems to be an independent tool. What do you think of adding an explanation / data flow diagram of how this will be used and in what case in conjunction with the Metadata/RFS workflow?

peternied · 2025-05-01T16:25:39Z

RFS/src/main/java/org/opensearch/migrations/bulkload/version_universal/RemoteDataProvider.java


    @Override
-    public List<Index> getIndicesInSnapshot(String snapshotName) {
+    public List<Index> getIndicesInSnapshot() {


Really appreciate this refactor - for your consideration breakout the 'fix' for the snapshot name into its own PR so we could fast-track merging of that change and isolate it from the other work that you are doing.

peternied · 2025-06-03T20:21:07Z

@gregschohn This has been open for more than a month, need any help with next steps or should this be closed out?

AndreKurait · 2025-06-13T15:04:26Z

I connected with @gregschohn, we aligned to close out this PR and focus on a different way to cut this in https://opensearch.atlassian.net/browse/MIGRATIONS-2564

gregschohn had a problem deploying to migrations-cicd April 28, 2025 15:05 — with GitHub Actions Failure

peternied reviewed Apr 28, 2025

View reviewed changes

gregschohn had a problem deploying to migrations-cicd May 1, 2025 11:27 — with GitHub Actions Failure

peternied reviewed May 1, 2025

View reviewed changes

AndreKurait closed this Jun 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactored snapshot handling to make it easier to add new Lucene sources #1479

Refactored snapshot handling to make it easier to add new Lucene sources #1479

Uh oh!

gregschohn commented Apr 28, 2025

Uh oh!

peternied left a comment

Uh oh!

peternied Apr 28, 2025

Uh oh!

peternied Apr 28, 2025

Uh oh!

peternied Apr 28, 2025

Uh oh!

peternied Apr 28, 2025

Uh oh!

peternied Apr 28, 2025

Uh oh!

peternied May 1, 2025

Uh oh!

peternied May 1, 2025

Uh oh!

peternied commented Jun 3, 2025

Uh oh!

AndreKurait commented Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	public class ElasticsearchSnapshotDocumentRepository extends LuceneBasedDocumentRepository {
	public class SearchClusterSnapshotDocumentRepository extends LuceneBasedDocumentRepository {

Refactored snapshot handling to make it easier to add new Lucene sources #1479

Refactored snapshot handling to make it easier to add new Lucene sources #1479

Uh oh!

Conversation

gregschohn commented Apr 28, 2025

Description

Issues Resolved

Testing

Check List

Uh oh!

peternied left a comment

Choose a reason for hiding this comment

Uh oh!

peternied Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

peternied Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

peternied Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

peternied Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

peternied Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

peternied May 1, 2025

Choose a reason for hiding this comment

Uh oh!

peternied May 1, 2025

Choose a reason for hiding this comment

Uh oh!

peternied commented Jun 3, 2025

Uh oh!

AndreKurait commented Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants