Skip to content

Support reading multiple regions with single fetch #118496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Tim-Brooks
Copy link
Contributor

This commit adds support for filling multiple regions in blob cache with
a single fetch from the object store.

@Tim-Brooks Tim-Brooks added >non-issue :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. v9.0.0 labels Dec 11, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Dec 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Dec 11, 2024
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, I wonder about the case where we have regions that are already filled?

regionRange,
regionsListener.acquire()
);
if (regionGaps.isEmpty() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should break it up if we hit a region that needs no data read? We could in principle risk having everything but the first and last region of a file and thus we could download an excess amount of extra data.

We can also build this into the shared stream handler.

Or forego it for now. I'd be ok with that too, but wanted to get views on it.

@@ -1082,7 +1205,29 @@ void populateAndRead(
}
}

List<Runnable> gapFillingTasks(
int gapOffset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not the region offset more than the gap offset?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment.

) {
ArrayList<Runnable> gapFillingTasks = new ArrayList<>();
for (Tuple<CacheFileRegion<KeyType>, RegionGaps> gapsToFetch : gaps) {
int offset = Math.toIntExact(gapsToFetch.v2().regionOffset());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unsafe, I think we need this offset to be a long and this has to pass through all the way to relativePos in SourceInputStreamFactory.create(relativePos, ...)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants