Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17250 ABFS short reads can be merged with readahead. #2307

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

mukund-thakur
Copy link
Contributor

@mukund-thakur mukund-thakur commented Sep 15, 2020

Introducing fs.azure.readahead.range parameter which can be set by user.
Data will be populated in buffer for random reads as well which leads to lesser
remote calls.
This patch also changes the seek implementation to perform a lazy seek. Actual
seek is done when a read is initiated and data is not present in buffer else
date is returned from buffer thus reducing the number of remote calls.

Introducing fs.azure.readahead.range parameter which can be set by user.
Data will be populated in buffer for random reads as well which leads to lesser
remote calls.
This patch also changes the seek implementation to perform a lazy seek. Actual
seek is done when a read is initiated and data is not present in buffer else
date is returned from buffer thus reducing the number of remote calls.
@mukund-thakur
Copy link
Contributor Author

CC @steveloughran @snvijaya

Copy link
Contributor

@snvijaya snvijaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy seek change is a useful one by helping avoid the remote calls incase of random read and avoiding a system.arraycopy trying to find if its already readahead if sequential.

Just few points to be addressed in this PR:

  1. The PR description has the default message for PR. Please remove unwanted text.
  2. Yetus has reported checkstyle issues, please check.
  3. Please post test results with HNS and non-HNS account configured

Wrt the random read change to readahead, how much of perf benefit did the change bring in for the Hive job ?

// Enabling read ahead for random reads as well to reduce number of remote calls.
int lengthWithReadAhead = Math.min(b.length + readAheadRange, bufferSize);
LOG.debug("Random read with read ahead size of {}", lengthWithReadAhead);
bytesRead = readInternal(fCursor, buffer, 0, lengthWithReadAhead, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with Parquet and ORC we have seen read patterns move from sequential to random and vice versa. That being the case would it not be better to read ahead to bufferSize always ? Providing options to read to lower bytes like 64 KB can actually lead to more IOPs. From our meeting yesterday too , one thing we all agree to was lower the IOPs better and also better to read more than smaller size.
So let remove the config for readAheadRange and instead always readAhead for whats configured for bufferSize.

Copy link
Contributor

@steveloughran steveloughran Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the S3A experience (which didn't always read into a buffer, BTW), the "penalty" of having a large readahead range is there is more data to drain when you want to cancel the read (ie. a seek out of range).
That code does the draining in the active thread. If that were to be done in a background thread, the penalty of a larger readahead would be less, as you would only see a delay from the draining if there were no free HTTPS connections in the pool. Setting up a new HTTPS connection is expensive though. If there were no free HTTPS connections in the pool, you would be better off draining the stream in the active thread. Maybe.

(Disclaimer: all my claims about cost of HTTPS are based on S3 +Java7/8, and S3 is very slow to set up a connection. If the ADLS Gen2 store is faster to negotiate then it becomes a lot more justifiable to drain in a separate thread)

@steveloughran
Copy link
Contributor

Sneha,
What are the likely times to

  1. negotiate a new HTTPS connection
  2. read 4MB in a single ranged GET request
  3. read less that 4MB in a single ranged GET request, e.g. 2MB.

If there's a fixed latency for the GET irrespective of size, then small reads are very inefficient per byte, reading the whole buffer would be justifiable.

Also: which makes for the simplest code to write. review and maintain? Let's not ignore that little detail, especially given my experience of shipping a broken implementation of this in S3AInputStream.

@mukund-thakur
Copy link
Contributor Author

This is the output of performance benchmark done with this patch along with some hive tuning.
NOTE : Results may differ now.
Screenshot 2020-09-17 at 4 28 59 PM

@snvijaya
Copy link
Contributor

Sneha,
What are the likely times to

  1. negotiate a new HTTPS connection
  2. read 4MB in a single ranged GET request
  3. read less that 4MB in a single ranged GET request, e.g. 2MB.

If there's a fixed latency for the GET irrespective of size, then small reads are very inefficient per byte, reading the whole buffer would be justifiable.

Also: which makes for the simplest code to write. review and maintain? Let's not ignore that little detail, especially given my experience of shipping a broken implementation of this in S3AInputStream.

Hi Steve, I get your inputs and agree that observations from above points can validate a better config setting for readaheadrange. Let me try to see if I can measure up the points 1-3. Request you to give me a couple of days to get back.

@mukund-thakur
Copy link
Contributor Author

Tested using HNS account in us-east-1. All good.

Tested using NON-HNS account in us-east-1. Seeing this failure
[ERROR] ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsFalse:96->unsetAndAssert:107 [getIsNamespaceEnabled should return the value configured for fs.azure.test.namespace.enabled] expected:<[tru]e> but was:<[fals]e> [ERROR] ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsTrue:86->unsetAndAssert:107 [getIsNamespaceEnabled should return the value configured for fs.azure.test.namespace.enabled] expected:<[tru]e> but was:<[fals]e> [ERROR] ITestGetNameSpaceEnabled.testXNSAccount:67->Assert.assertTrue:41->Assert.fail:88 Expecting getIsNamespaceEnabled() return true

I think tests should be skipped rather than failing if fs.azure.test.namespace.enabled is not set.

@steveloughran steveloughran changed the title HADOOP-17250 Lot of short reads can be merged with readahead. HADOOP-17250 ABFS short reads can be merged with readahead. Sep 18, 2020
@steveloughran
Copy link
Contributor

BTW, #2168 is calling out for reviewers. This defines a standard option for setting seek policy, and another for file length (you can skip the HEAD check then,see). And it sets distcp and other download operations (including YARN) to always do sequential.

For ABFS, that tells the stream that one big GET with as much prefetch as you can do is going to be best

@steveloughran steveloughran added enhancement fs/azure changes related to azure; submitter must declare test endpoint labels Sep 24, 2020
@steveloughran
Copy link
Contributor

@mukund-thakur : can you bring this up to date?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement fs/azure changes related to azure; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants