-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17250 ABFS short reads can be merged with readahead. #2307
base: trunk
Are you sure you want to change the base?
Conversation
Introducing fs.azure.readahead.range parameter which can be set by user. Data will be populated in buffer for random reads as well which leads to lesser remote calls. This patch also changes the seek implementation to perform a lazy seek. Actual seek is done when a read is initiated and data is not present in buffer else date is returned from buffer thus reducing the number of remote calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lazy seek change is a useful one by helping avoid the remote calls incase of random read and avoiding a system.arraycopy trying to find if its already readahead if sequential.
Just few points to be addressed in this PR:
- The PR description has the default message for PR. Please remove unwanted text.
- Yetus has reported checkstyle issues, please check.
- Please post test results with HNS and non-HNS account configured
Wrt the random read change to readahead, how much of perf benefit did the change bring in for the Hive job ?
// Enabling read ahead for random reads as well to reduce number of remote calls. | ||
int lengthWithReadAhead = Math.min(b.length + readAheadRange, bufferSize); | ||
LOG.debug("Random read with read ahead size of {}", lengthWithReadAhead); | ||
bytesRead = readInternal(fCursor, buffer, 0, lengthWithReadAhead, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with Parquet and ORC we have seen read patterns move from sequential to random and vice versa. That being the case would it not be better to read ahead to bufferSize always ? Providing options to read to lower bytes like 64 KB can actually lead to more IOPs. From our meeting yesterday too , one thing we all agree to was lower the IOPs better and also better to read more than smaller size.
So let remove the config for readAheadRange and instead always readAhead for whats configured for bufferSize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the S3A experience (which didn't always read into a buffer, BTW), the "penalty" of having a large readahead range is there is more data to drain when you want to cancel the read (ie. a seek out of range).
That code does the draining in the active thread. If that were to be done in a background thread, the penalty of a larger readahead would be less, as you would only see a delay from the draining if there were no free HTTPS connections in the pool. Setting up a new HTTPS connection is expensive though. If there were no free HTTPS connections in the pool, you would be better off draining the stream in the active thread. Maybe.
(Disclaimer: all my claims about cost of HTTPS are based on S3 +Java7/8, and S3 is very slow to set up a connection. If the ADLS Gen2 store is faster to negotiate then it becomes a lot more justifiable to drain in a separate thread)
Sneha,
If there's a fixed latency for the GET irrespective of size, then small reads are very inefficient per byte, reading the whole buffer would be justifiable. Also: which makes for the simplest code to write. review and maintain? Let's not ignore that little detail, especially given my experience of shipping a broken implementation of this in S3AInputStream. |
Hi Steve, I get your inputs and agree that observations from above points can validate a better config setting for readaheadrange. Let me try to see if I can measure up the points 1-3. Request you to give me a couple of days to get back. |
Tested using HNS account in us-east-1. All good. Tested using NON-HNS account in us-east-1. Seeing this failure I think tests should be skipped rather than failing if fs.azure.test.namespace.enabled is not set. |
BTW, #2168 is calling out for reviewers. This defines a standard option for setting seek policy, and another for file length (you can skip the HEAD check then,see). And it sets distcp and other download operations (including YARN) to always do sequential. For ABFS, that tells the stream that one big GET with as much prefetch as you can do is going to be best |
@mukund-thakur : can you bring this up to date? |
Introducing fs.azure.readahead.range parameter which can be set by user.
Data will be populated in buffer for random reads as well which leads to lesser
remote calls.
This patch also changes the seek implementation to perform a lazy seek. Actual
seek is done when a read is initiated and data is not present in buffer else
date is returned from buffer thus reducing the number of remote calls.