HADOOP-16848. Refactoring: initial layering #1839

steveloughran · 2020-02-09T20:59:36Z

A Refactoring of the S3A code structure

Goals: layer better for isolation, maintenance, testability, extra features

split the public Hadoop FS API (and a few other entry points) from the operations used to implement them
add a factory for building AWS requests so that common options (encryption, requester pays) can be consistently added
add a RawS3A layer where no S3Guard interaction is found
a Store Layer for S3Guard-aware interaction

The latest version has the RawS3A and request factory design. The store layer, less so.

Given we don't need S3Guard forever, that split between store and raw s3 is potentially simpler. Do we need two layers? I'd say yes, for now

But looking at the RawS3A code, I think all the code in there doing retry, exception translation etc -that's kind of store level

Proposed

RawS3A is actually Object operations, including retries all and translation. Other than the factory and retry invoker, it is self-contained.

S3Store is the FS model but the internal API, not the Hadoop FS API. Its arguments are generally Paths and FS API operations

This relates to the idea of a context plugin

FileSystem entry points should all add their own context entry/exit points. (or put differently)
everything in S3AStore should be within a context already
request factory (or in rawS3A) is where context is handed AWS request before each API call so it can annotate the request with telemetry

Complications

need to pass duration factory of active request down to whichever code is doing retries.

Maybe every RawS3A API Call takes a RequestContext with this and anything else?
A Refactoring of the S3A code structure

Goals: layer better for isolation, maintenance, testability, extra features

split the public Hadoop FS API (and a few other entry points) from the operations used to implement them
add a factory for building AWS requests so that common options (encryption, requester pays) can be consistently added
add a RawS3A layer where no S3Guard interaction is found
a Store Layer for S3Guard-aware interaction

The latest version has the RawS3A and request factory design. The store layer, less so.

Given we don't need S3Guard forever, that split between store and raw s3 is potentially simpler. Do we need two layers? I'd say yes, for now

But looking at the RawS3A code, I think all the code in there doing retry, exception translation etc -that's kind of store level

Proposed

RawS3A is actually Object operations, including retries all and translation. Other than the factory and retry invoker, it is self-contained.

S3Store is the FS model but the internal API, not the Hadoop FS API. Its arguments are generally Paths and FS API operations

This relates to the idea of a context plugin

FileSystem entry points should all add their own context entry/exit points. (or put differently)
everything in S3AStore should be within a context already
request factory (or in rawS3A) is where context is handed AWS request before each API call so it can annotate the request with telemetry

Complications

need to pass duration factory of active request down to whichever code is doing retries.

Maybe every RawS3A API Call takes a RequestContext with this and anything else?

steveloughran · 2020-02-09T21:02:41Z

FYI @bgaborg @sidseth

not quite ready for review, but you get the idea. Split up the S3AFileSystem class into layers with interface and impl, async chained init from S3A FS, which creates them, knows about DelegationTokens, and how to bind its operations to the store

hadoop-yetus · 2020-03-03T13:10:49Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 36s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
		_ trunk Compile Tests _
+0 🆗	mvndep	1m 7s	Maven dependency ordering for branch
+1 💚	mvninstall	18m 54s	trunk passed
+1 💚	compile	17m 5s	trunk passed
+1 💚	checkstyle	2m 36s	trunk passed
+1 💚	mvnsite	2m 19s	trunk passed
+1 💚	shadedclient	20m 9s	branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 44s	trunk passed
+0 🆗	spotbugs	1m 12s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	3m 16s	trunk passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 24s	Maven dependency ordering for patch
-1 ❌	mvninstall	0m 28s	hadoop-aws in the patch failed.
-1 ❌	compile	15m 22s	root in the patch failed.
-1 ❌	javac	15m 22s	root in the patch failed.
-0 ⚠️	checkstyle	2m 41s	root: The patch generated 41 new + 31 unchanged - 0 fixed = 72 total (was 31)
-1 ❌	mvnsite	0m 48s	hadoop-aws in the patch failed.
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	shadedclient	14m 6s	patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	1m 43s	the patch passed
-1 ❌	findbugs	0m 46s	hadoop-aws in the patch failed.
		_ Other Tests _
-1 ❌	unit	9m 30s	hadoop-common in the patch passed.
-1 ❌	unit	0m 48s	hadoop-aws in the patch failed.
+1 💚	asflicense	0m 51s	The patch does not generate ASF License warnings.
		119m 52s

Reason	Tests
Failed junit tests	hadoop.fs.shell.TestCopy

Subsystem	Report/Notes
Docker	Client=19.03.6 Server=19.03.6 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/Dockerfile
GITHUB PR	#1839
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux ee0a173a422e 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / c0d0842
Default Java	1.8.0_242
mvninstall	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-mvninstall-hadoop-tools_hadoop-aws.txt
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-compile-root.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-compile-root.txt
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/diff-checkstyle-root.txt
mvnsite	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-mvnsite-hadoop-tools_hadoop-aws.txt
findbugs	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-findbugs-hadoop-tools_hadoop-aws.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/testReport/
Max. process+thread count	2517 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

steveloughran · 2020-09-24T09:26:33Z

mockito test failure in async store launch

NFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.255 s - in org.apache.hadoop.fs.s3a.TestDataBlocks
[INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.401 s - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider
[INFO] Running org.apache.hadoop.fs.s3a.s3guard.TestAuthoritativePath
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 1.236 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.TestListing
[ERROR] testProvidedFileStatusIteratorEnd(org.apache.hadoop.fs.s3a.TestListing)  Time elapsed: 1.063 s  <<< ERROR!
java.lang.NullPointerException
	at org.apache.hadoop.fs.impl.FutureIOSupport.awaitFutureQuietly(FutureIOSupport.java:250)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.awaitStoreLive(S3AFileSystem.java:4828)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.store(S3AFileSystem.java:4838)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:420)
	at org.apache.hadoop.fs.s3a.AbstractS3AMockTest.setup(AbstractS3AMockTest.java:62)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)

viewing this PR as a PoC, I think the async startup code needlessly complicates life. skipping bucket existence probes is enough to make for a fast launch.

Proposed:

pull out the request factory into its own PR
Add support for requester pays
Unit tests to verify request pays and encryption options passed in everywhere

steveloughran · 2021-02-01T15:30:05Z

Not Tested*

hadoop-yetus · 2021-02-01T16:10:12Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 29s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚		0m 0s	test4tests	The patch appears to include 5 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	32m 34s		trunk passed
+1 💚	compile	0m 44s		trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚	compile	0m 37s		trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚	checkstyle	0m 32s		trunk passed
+1 💚	mvnsite	0m 44s		trunk passed
+1 💚	shadedclient	14m 15s		branch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 25s		trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚	javadoc	0m 32s		trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+0 🆗	spotbugs	1m 7s		Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	1m 5s		trunk passed
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 34s		the patch passed
+1 💚	compile	0m 35s		the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚	javac	0m 35s		the patch passed
+1 💚	compile	0m 30s		the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚	javac	0m 30s		the patch passed
-0 ⚠️	checkstyle	0m 22s	/diff-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 37 new + 42 unchanged - 0 fixed = 79 total (was 42)
+1 💚	mvnsite	0m 33s		the patch passed
-1 ❌	whitespace	0m 0s	/whitespace-eol.txt	The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	shadedclient	12m 38s		patch has no errors when building and testing our client artifacts.
+1 💚	javadoc	0m 19s		the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
-1 ❌	javadoc	0m 27s	/diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01.txt	hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~~20.04-b01 with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~~20.04-b01 generated 7 new + 88 unchanged - 0 fixed = 95 total (was 88)
-1 ❌	findbugs	1m 8s	/new-findbugs-hadoop-tools_hadoop-aws.html	hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
			_ Other Tests _
-1 ❌	unit	1m 58s	/patch-unit-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch passed.
+1 💚	asflicense	0m 34s		The patch does not generate ASF License warnings.
		73m 42s

Reason	Tests
FindBugs	module:hadoop-tools/hadoop-aws
	Inconsistent synchronization of org.apache.hadoop.fs.s3a.S3AFileSystem.signerManager; locked 66% of time Unsynchronized access at S3AFileSystem.java:66% of time Unsynchronized access at S3AFileSystem.java:[line 4864]
Failed junit tests	hadoop.fs.s3a.TestS3ABlockOutputStream
	hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1839/1/artifact/out/Dockerfile
GITHUB PR	#1839
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux 562af3dc1838 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `115623a`
Default Java	Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1839/1/testReport/
Max. process+thread count	666 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1839/1/console
versions	git=2.25.1 maven=3.6.3 findbugs=4.0.6
Powered by	Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

First PoC of My planned layout model of the S3A FS. * There's a raw layer and a guarded layer * which are instantiated in sequence in a separate executor from S3AFileSystem.initalize And whose accessors block until completed or rethrow failures. The layers are being handed in all their dependencies from FS.initialize() and we currently block until started. What I plan to do, in a future iteration, is: * each layer extracts their own settings from the config and stores locally (list version, upload size etc) * have each layer instantiate their internal classes (AWS S3 client, transfer manager) internally * Also async create: metastore, DT binding * And all startup actions (check bucket, init multipart, ...) Then * move ops to the layers, raw* -> rawStore; inner -> S3AStore * move WriteOperationHelper, SelectBinding, etc, to all work against S3AStore rather than FS. S3AStore will become where most of the code moves to; S3AFilesystem more of the init and binding to hadoop FS API. RawS3A will be the accessor through which all AWS client access goes. Not going to change: all accessors on S3AFileSystem...not just tests use it but some external code (cloudstore) needs it to get at low level S3A, etc. Change-Id: I998c0d61cce2ee7fd0be804bf21da6b68fd69a6f HADOOP-16583 refactoring RequestFactory and RawS3A This moves most of the s3 client interaction into RawS3AImpl, Mainly just by moving the methods from S3AFileSystem. One key finding was that we can put all the code to create Request classes for the AWS SDK into its own factory, So ensure everything is set up consistently and keeping what is mainly housekeeping out of the way of everything else. We can do that rework immediately, as it doesn't require the rest of the layering. Also, the rawS3A() accessor no longer raises IOEs, it was Used into many places where that wasn't allowed. I sort of expected that. There's an awaitQuietly() (todo: change name) method which returns all failures as RTEs rather than IOEs. For the S3A FS API entry points, we still need to raise the normal startup exceptions, so we will need extra block/validate there after which, maybe, we can change the logic in these new getters just to raise an exception if their refs are null, rather than block. Change-Id: I3d15a46dd3034fc5d34dbaf01aaddba462d63a9d HADOOP-16583. refactoring -fix regression I'd pulled an import which wasn't used, but a recent change reinstated its need Change-Id: I12b0725a14a6e7bae1ecf93fcbd4d05e1eacb03e HADOOP-16848. Building against trunk Change-Id: Id1af0763998dd285bb2476298cdb94edf809b67c

RequestFactory adds factory construction and full set of operations. Moves all S3 IO into the RawS3A class, takes factory S3 select API call goes into the store remove attempt at async init. Too complex. Change-Id: I36bb2884ebe0d7183f99f25860f79e3796a112fb

steveloughran added fs/s3 changes related to hadoop-aws; submitter must declare test endpoint work in progress PRs still Work in Progress; reviews not expected but still welcome labels Feb 9, 2020

steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from e068532 to 94637de Compare February 19, 2020 11:18

apache deleted a comment from hadoop-yetus Feb 19, 2020

steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from 94637de to c82e823 Compare March 2, 2020 18:56

steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from 0dc715f to 77c2d47 Compare September 23, 2020 12:38

apache deleted a comment from hadoop-yetus Sep 24, 2020

steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from 77c2d47 to ba49cb7 Compare February 1, 2021 14:55

steveloughran marked this pull request as draft February 1, 2021 15:29

steveloughran requested a review from bgaborg February 1, 2021 15:29

steveloughran added 2 commits February 11, 2021 15:13

HADOOP-16848. S3A refactoring

15f9adc

RequestFactory adds factory construction and full set of operations. Moves all S3 IO into the RawS3A class, takes factory S3 select API call goes into the store remove attempt at async init. Too complex. Change-Id: I36bb2884ebe0d7183f99f25860f79e3796a112fb

steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from ba49cb7 to 15f9adc Compare April 26, 2021 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HADOOP-16848. Refactoring: initial layering #1839

HADOOP-16848. Refactoring: initial layering #1839

Uh oh!

steveloughran commented Feb 9, 2020 •

edited

Loading

Uh oh!

steveloughran commented Feb 9, 2020

Uh oh!

hadoop-yetus commented Mar 3, 2020

Uh oh!

steveloughran commented Sep 24, 2020

Uh oh!

steveloughran commented Feb 1, 2021

Uh oh!

hadoop-yetus commented Feb 1, 2021

Uh oh!

Uh oh!

HADOOP-16848. Refactoring: initial layering #1839

Are you sure you want to change the base?

HADOOP-16848. Refactoring: initial layering #1839

Uh oh!

Conversation

steveloughran commented Feb 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveloughran commented Feb 9, 2020

Uh oh!

hadoop-yetus commented Mar 3, 2020

Uh oh!

steveloughran commented Sep 24, 2020

Uh oh!

steveloughran commented Feb 1, 2021

Uh oh!

hadoop-yetus commented Feb 1, 2021

Uh oh!

Uh oh!

steveloughran commented Feb 9, 2020 •

edited

Loading