Skip to content

Conversation

gregschohn
Copy link
Collaborator

Description

WIP - I've refactored snapshot handling throughout shard setup and document retrieval to make it easier to add new Lucene document sources.

I'm also in the process of adding new parameters -

  1. There are parameters for the coordination cluster, of the same type as what we use for source and target, so that the target cluster doesn't need to be used for coordination. A wholly independent cluster can be setup just for coordination. There's a patch on the argo branch to support flexible index names too, so one could imagine running many deployments with the same coordination index.
  2. snapshotLoader* is in place, though only partially implemented, to allow alternate ways to get shard info and create document streams. I intend to load the specified class in via ServiceLoader, similar to how Transformations work today.
  3. initialSubdivision hasn't been implemented yet, but I think it would be pretty easy and super useful, especially for the case where users want to reshard an index. Rather than use a single worker at a time for a source shard, this would pull the shard down to find the number of documents, then split the work up into that many shards so that the migration could be completed by that factor faster (since there wouldn't be idling target shards). This should be easy to implement by adding an end document index so that we can split a shard in parallel. It would also be easy to detect the first one since the initial work item would have a -1 for its end index.

Everything else was just shuffling old code around into constructors and the newly created ElasticsearchSnapshotDocumentRepository, which wraps up the S3-centric objects and metadata that RFS had since been using beforehand. The EncryptedShardBucketCollectionDocumentRepository also has stub code that won't do anything. Since we don't have a spec for reading encrypted files yet, this will be hived off, to be provided via a hot-loaded jar at some point in the future.

Issues Resolved

Testing

Check List

  • New functionality includes testing
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…cument retrieval to make it easier to add new Lucene document sources.

I'm also in the process of adding new parameters -
1. There are parameters for the coordination cluster, of the same type as what we use for source and target, so that the target cluster doesn't need to be used for coordination.  A wholly independent cluster can be setup just for coordination.  There's a patch on the argo branch to support flexible index names too, so one could imagine running many deployments with the same coordination index.
2. snapshotLoader* is in place, though only partially implemented, to allow alternate ways to get shard info and create document streams.  I intend to load the specified class in via ServiceLoader, similar to how Transformations work today.
3. initialSubdivision hasn't been implemented yet, but I think it would be pretty easy and super useful, especially for the case where users want to reshard an index.  Rather than use a single worker at a time for a source shard, this would pull the shard down to find the number of documents, then split the work up into that many shards so that the migration could be completed by that factor faster (since there wouldn't be idling target shards).  This should be easy to implement by adding an end document index so that we can split a shard in parallel.  It would also be easy to detect the first one since the initial work item would have a -1 for its end index.

Everything else was just shuffling old code around into constructors and the newly created ElasticsearchSnapshotDocumentRepository, which wraps up the S3-centric objects and metadata that RFS had since been using beforehand.
The EncryptedShardBucketCollectionDocumentRepository also has stub code that won't do anything.  Since we don't have a spec for reading encrypted files yet, this will be hived off, to be provided via a hot-loaded jar at some point in the future.

Signed-off-by: Greg Schohn <[email protected]>
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level, this looks good, thanks for cleaning up some of the snapshot reading interface.

@Parameter(required = true,
@Parameter(required = false,
names = {"--snapshotLoader"},
description = "Hot-loaded class that should be used to download the lucene directory of a shard")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Hot loading' is a bit of a misnomer, terrified me a little bit when first looking at the code as it implies a class loader, glad to see that isn't the case. IMO is a good candidate for a future expansion as need.


@Parameter(required = false,
names = { "--initial-subdivision", "--initialSubdivision" },
converter = VersionConverter.class,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem correct

import org.opensearch.migrations.cluster.ClusterProviderRegistry;
import org.opensearch.migrations.cluster.ClusterSnapshotReader;

public class ElasticsearchSnapshotDocumentRepository extends LuceneBasedDocumentRepository {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We seem to be using SearchCluster for ES or OS

Suggested change
public class ElasticsearchSnapshotDocumentRepository extends LuceneBasedDocumentRepository {
public class SearchClusterSnapshotDocumentRepository extends LuceneBasedDocumentRepository {


@Override
public LuceneIndexReader getReader(String index, int shard) {
return new IndexReader9(downloadManager.downloadToLocalPath(index, shard), false, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging lets rationalize this with the version matrix used in LuceneIndexReader

}

@Getter
public static class CoordinatorArgs implements ConnectionContext.IParams {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to the existing target args, what is the use case for this over the target?

…o a process-oriented one.

In this case, a command and some config parameter is passed to the Process.start().
As the code is, that command will be called with different conventions, with different command arguments, as well as additional arguments to that command (ie. indexName or indexName and shard) to fulfill the 4 different operations that the LuceneBasedDocumentRepository needs.
I've not built out a program that fulfills this contract yet, so an end-to-end test through this is still pending.

Signed-off-by: Greg Schohn <[email protected]>
import lombok.extern.slf4j.Slf4j;

@Slf4j
public class ManuallyManagedLuceneShardRepository extends LuceneBasedDocumentRepository {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to make of this class, it seems like it belongs in its own top level directory as it seems to be an independent tool. What do you think of adding an explanation / data flow diagram of how this will be used and in what case in conjunction with the Metadata/RFS workflow?


@Override
public List<Index> getIndicesInSnapshot(String snapshotName) {
public List<Index> getIndicesInSnapshot() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate this refactor - for your consideration breakout the 'fix' for the snapshot name into its own PR so we could fast-track merging of that change and isolate it from the other work that you are doing.

@peternied
Copy link
Member

@gregschohn This has been open for more than a month, need any help with next steps or should this be closed out?

@AndreKurait
Copy link
Member

I connected with @gregschohn, we aligned to close out this PR and focus on a different way to cut this in https://opensearch.atlassian.net/browse/MIGRATIONS-2564

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants