HADOOP-17250 ABFS short reads can be merged with readahead. #2307

mukund-thakur · 2020-09-15T14:28:00Z

Introducing fs.azure.readahead.range parameter which can be set by user.
Data will be populated in buffer for random reads as well which leads to lesser
remote calls.
This patch also changes the seek implementation to perform a lazy seek. Actual
seek is done when a read is initiated and data is not present in buffer else
date is returned from buffer thus reducing the number of remote calls.

Introducing fs.azure.readahead.range parameter which can be set by user. Data will be populated in buffer for random reads as well which leads to lesser remote calls. This patch also changes the seek implementation to perform a lazy seek. Actual seek is done when a read is initiated and data is not present in buffer else date is returned from buffer thus reducing the number of remote calls.

mukund-thakur · 2020-09-15T14:28:24Z

CC @steveloughran @snvijaya

snvijaya

The lazy seek change is a useful one by helping avoid the remote calls incase of random read and avoiding a system.arraycopy trying to find if its already readahead if sequential.

Just few points to be addressed in this PR:

The PR description has the default message for PR. Please remove unwanted text.
Yetus has reported checkstyle issues, please check.
Please post test results with HNS and non-HNS account configured

Wrt the random read change to readahead, how much of perf benefit did the change bring in for the Hive job ?

snvijaya · 2020-09-16T14:01:03Z

...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java

+        // Enabling read ahead for random reads as well to reduce number of remote calls.
+        int lengthWithReadAhead = Math.min(b.length + readAheadRange, bufferSize);
+        LOG.debug("Random read with read ahead size of {}", lengthWithReadAhead);
+        bytesRead = readInternal(fCursor, buffer, 0, lengthWithReadAhead, true);


As with Parquet and ORC we have seen read patterns move from sequential to random and vice versa. That being the case would it not be better to read ahead to bufferSize always ? Providing options to read to lower bytes like 64 KB can actually lead to more IOPs. From our meeting yesterday too , one thing we all agree to was lower the IOPs better and also better to read more than smaller size.
So let remove the config for readAheadRange and instead always readAhead for whats configured for bufferSize.

Based on the S3A experience (which didn't always read into a buffer, BTW), the "penalty" of having a large readahead range is there is more data to drain when you want to cancel the read (ie. a seek out of range).
That code does the draining in the active thread. If that were to be done in a background thread, the penalty of a larger readahead would be less, as you would only see a delay from the draining if there were no free HTTPS connections in the pool. Setting up a new HTTPS connection is expensive though. If there were no free HTTPS connections in the pool, you would be better off draining the stream in the active thread. Maybe.

(Disclaimer: all my claims about cost of HTTPS are based on S3 +Java7/8, and S3 is very slow to set up a connection. If the ADLS Gen2 store is faster to negotiate then it becomes a lot more justifiable to drain in a separate thread)

steveloughran · 2020-09-17T10:50:21Z

Sneha,
What are the likely times to

negotiate a new HTTPS connection
read 4MB in a single ranged GET request
read less that 4MB in a single ranged GET request, e.g. 2MB.

If there's a fixed latency for the GET irrespective of size, then small reads are very inefficient per byte, reading the whole buffer would be justifiable.

Also: which makes for the simplest code to write. review and maintain? Let's not ignore that little detail, especially given my experience of shipping a broken implementation of this in S3AInputStream.

mukund-thakur · 2020-09-17T11:05:05Z

This is the output of performance benchmark done with this patch along with some hive tuning.
NOTE : Results may differ now.

snvijaya · 2020-09-17T14:01:39Z

Sneha,
What are the likely times to

negotiate a new HTTPS connection

read 4MB in a single ranged GET request

read less that 4MB in a single ranged GET request, e.g. 2MB.

If there's a fixed latency for the GET irrespective of size, then small reads are very inefficient per byte, reading the whole buffer would be justifiable.

Also: which makes for the simplest code to write. review and maintain? Let's not ignore that little detail, especially given my experience of shipping a broken implementation of this in S3AInputStream.

Hi Steve, I get your inputs and agree that observations from above points can validate a better config setting for readaheadrange. Let me try to see if I can measure up the points 1-3. Request you to give me a couple of days to get back.

mukund-thakur · 2020-09-18T09:28:35Z

Tested using HNS account in us-east-1. All good.

Tested using NON-HNS account in us-east-1. Seeing this failure
[ERROR] ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsFalse:96->unsetAndAssert:107 [getIsNamespaceEnabled should return the value configured for fs.azure.test.namespace.enabled] expected:<[tru]e> but was:<[fals]e> [ERROR] ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsTrue:86->unsetAndAssert:107 [getIsNamespaceEnabled should return the value configured for fs.azure.test.namespace.enabled] expected:<[tru]e> but was:<[fals]e> [ERROR] ITestGetNameSpaceEnabled.testXNSAccount:67->Assert.assertTrue:41->Assert.fail:88 Expecting getIsNamespaceEnabled() return true

I think tests should be skipped rather than failing if fs.azure.test.namespace.enabled is not set.

steveloughran · 2020-09-18T18:54:50Z

BTW, #2168 is calling out for reviewers. This defines a standard option for setting seek policy, and another for file length (you can skip the HEAD check then,see). And it sets distcp and other download operations (including YARN) to always do sequential.

For ABFS, that tells the stream that one big GET with as much prefetch as you can do is going to be best

steveloughran · 2021-04-28T16:31:33Z

@mukund-thakur : can you bring this up to date?

snvijaya suggested changes Sep 16, 2020

View reviewed changes

Fixing checkstyle issues

9b27bed

steveloughran changed the title ~~HADOOP-17250 Lot of short reads can be merged with readahead.~~ HADOOP-17250 ABFS short reads can be merged with readahead. Sep 18, 2020

steveloughran added enhancement fs/azure changes related to azure; submitter must declare test endpoint labels Sep 24, 2020

mukund-thakur mentioned this pull request Jun 16, 2021

HADOOP-17250 Lot of short reads can be merged with readahead. #3110

Merged

apache deleted a comment from hadoop-yetus Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HADOOP-17250 ABFS short reads can be merged with readahead. #2307

HADOOP-17250 ABFS short reads can be merged with readahead. #2307

Uh oh!

mukund-thakur commented Sep 15, 2020 •

edited

Loading

Uh oh!

mukund-thakur commented Sep 15, 2020

Uh oh!

snvijaya left a comment

Uh oh!

snvijaya Sep 16, 2020

Uh oh!

steveloughran Sep 17, 2020 •

edited

Loading

Uh oh!

steveloughran commented Sep 17, 2020

Uh oh!

mukund-thakur commented Sep 17, 2020

Uh oh!

snvijaya commented Sep 17, 2020

Uh oh!

mukund-thakur commented Sep 18, 2020

Uh oh!

steveloughran commented Sep 18, 2020

Uh oh!

steveloughran commented Apr 28, 2021

Uh oh!

Uh oh!

HADOOP-17250 ABFS short reads can be merged with readahead. #2307

Are you sure you want to change the base?

HADOOP-17250 ABFS short reads can be merged with readahead. #2307

Uh oh!

Conversation

mukund-thakur commented Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mukund-thakur commented Sep 15, 2020

Uh oh!

snvijaya left a comment

Choose a reason for hiding this comment

Uh oh!

snvijaya Sep 16, 2020

Choose a reason for hiding this comment

Uh oh!

steveloughran Sep 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Sep 17, 2020

Uh oh!

mukund-thakur commented Sep 17, 2020

Uh oh!

snvijaya commented Sep 17, 2020

Uh oh!

mukund-thakur commented Sep 18, 2020

Uh oh!

steveloughran commented Sep 18, 2020

Uh oh!

steveloughran commented Apr 28, 2021

Uh oh!

Uh oh!

mukund-thakur commented Sep 15, 2020 •

edited

Loading

steveloughran Sep 17, 2020 •

edited

Loading