Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-16848. Refactoring: initial layering #1839

Draft
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Feb 9, 2020

A Refactoring of the S3A code structure

Goals: layer better for isolation, maintenance, testability, extra features

  • split the public Hadoop FS API (and a few other entry points) from the operations used to implement them
  • add a factory for building AWS requests so that common options (encryption, requester pays) can be consistently added
  • add a RawS3A layer where no S3Guard interaction is found
  • a Store Layer for S3Guard-aware interaction

The latest version has the RawS3A and request factory design. The store layer, less so.

Given we don't need S3Guard forever, that split between store and raw s3 is potentially simpler. Do we need two layers? I'd say yes, for now

But looking at the RawS3A code, I think all the code in there doing retry, exception translation etc -that's kind of store level

Proposed

RawS3A is actually Object operations, including retries all and translation. Other than the factory and retry invoker, it is self-contained.

S3Store is the FS model but the internal API, not the Hadoop FS API. Its arguments are generally Paths and FS API operations

This relates to the idea of a context plugin

  • FileSystem entry points should all add their own context entry/exit points. (or put differently)
  • everything in S3AStore should be within a context already
  • request factory (or in rawS3A) is where context is handed AWS request before each API call so it can annotate the request with telemetry

Complications

  • need to pass duration factory of active request down to whichever code is doing retries.

Maybe every RawS3A API Call takes a RequestContext with this and anything else?
A Refactoring of the S3A code structure

Goals: layer better for isolation, maintenance, testability, extra features

  • split the public Hadoop FS API (and a few other entry points) from the operations used to implement them
  • add a factory for building AWS requests so that common options (encryption, requester pays) can be consistently added
  • add a RawS3A layer where no S3Guard interaction is found
  • a Store Layer for S3Guard-aware interaction

The latest version has the RawS3A and request factory design. The store layer, less so.

Given we don't need S3Guard forever, that split between store and raw s3 is potentially simpler. Do we need two layers? I'd say yes, for now

But looking at the RawS3A code, I think all the code in there doing retry, exception translation etc -that's kind of store level

Proposed

RawS3A is actually Object operations, including retries all and translation. Other than the factory and retry invoker, it is self-contained.

S3Store is the FS model but the internal API, not the Hadoop FS API. Its arguments are generally Paths and FS API operations

This relates to the idea of a context plugin

  • FileSystem entry points should all add their own context entry/exit points. (or put differently)
  • everything in S3AStore should be within a context already
  • request factory (or in rawS3A) is where context is handed AWS request before each API call so it can annotate the request with telemetry

Complications

  • need to pass duration factory of active request down to whichever code is doing retries.

Maybe every RawS3A API Call takes a RequestContext with this and anything else?

@steveloughran steveloughran added fs/s3 changes related to hadoop-aws; submitter must declare test endpoint work in progress PRs still Work in Progress; reviews not expected but still welcome labels Feb 9, 2020
@steveloughran
Copy link
Contributor Author

FYI @bgaborg @sidseth

not quite ready for review, but you get the idea. Split up the S3AFileSystem class into layers with interface and impl, async chained init from S3A FS, which creates them, knows about DelegationTokens, and how to bind its operations to the store

@steveloughran steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from e068532 to 94637de Compare February 19, 2020 11:18
@apache apache deleted a comment from hadoop-yetus Feb 19, 2020
@steveloughran steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from 94637de to c82e823 Compare March 2, 2020 18:56
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 7s Maven dependency ordering for branch
+1 💚 mvninstall 18m 54s trunk passed
+1 💚 compile 17m 5s trunk passed
+1 💚 checkstyle 2m 36s trunk passed
+1 💚 mvnsite 2m 19s trunk passed
+1 💚 shadedclient 20m 9s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 44s trunk passed
+0 🆗 spotbugs 1m 12s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 16s trunk passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 28s hadoop-aws in the patch failed.
-1 ❌ compile 15m 22s root in the patch failed.
-1 ❌ javac 15m 22s root in the patch failed.
-0 ⚠️ checkstyle 2m 41s root: The patch generated 41 new + 31 unchanged - 0 fixed = 72 total (was 31)
-1 ❌ mvnsite 0m 48s hadoop-aws in the patch failed.
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 14m 6s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 43s the patch passed
-1 ❌ findbugs 0m 46s hadoop-aws in the patch failed.
_ Other Tests _
-1 ❌ unit 9m 30s hadoop-common in the patch passed.
-1 ❌ unit 0m 48s hadoop-aws in the patch failed.
+1 💚 asflicense 0m 51s The patch does not generate ASF License warnings.
119m 52s
Reason Tests
Failed junit tests hadoop.fs.shell.TestCopy
Subsystem Report/Notes
Docker Client=19.03.6 Server=19.03.6 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/Dockerfile
GITHUB PR #1839
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux ee0a173a422e 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / c0d0842
Default Java 1.8.0_242
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-mvninstall-hadoop-tools_hadoop-aws.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-compile-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-compile-root.txt
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/diff-checkstyle-root.txt
mvnsite https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-mvnsite-hadoop-tools_hadoop-aws.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-findbugs-hadoop-tools_hadoop-aws.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/testReport/
Max. process+thread count 2517 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1839/4/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from 0dc715f to 77c2d47 Compare September 23, 2020 12:38
@apache apache deleted a comment from hadoop-yetus Sep 24, 2020
@apache apache deleted a comment from hadoop-yetus Sep 24, 2020
@apache apache deleted a comment from hadoop-yetus Sep 24, 2020
@steveloughran
Copy link
Contributor Author

mockito test failure in async store launch

NFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.255 s - in org.apache.hadoop.fs.s3a.TestDataBlocks
[INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.401 s - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider
[INFO] Running org.apache.hadoop.fs.s3a.s3guard.TestAuthoritativePath
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 1.236 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.TestListing
[ERROR] testProvidedFileStatusIteratorEnd(org.apache.hadoop.fs.s3a.TestListing)  Time elapsed: 1.063 s  <<< ERROR!
java.lang.NullPointerException
	at org.apache.hadoop.fs.impl.FutureIOSupport.awaitFutureQuietly(FutureIOSupport.java:250)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.awaitStoreLive(S3AFileSystem.java:4828)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.store(S3AFileSystem.java:4838)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:420)
	at org.apache.hadoop.fs.s3a.AbstractS3AMockTest.setup(AbstractS3AMockTest.java:62)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)

viewing this PR as a PoC, I think the async startup code needlessly complicates life. skipping bucket existence probes is enough to make for a fast launch.

Proposed:

  1. pull out the request factory into its own PR
  2. Add support for requester pays
  3. Unit tests to verify request pays and encryption options passed in everywhere

@steveloughran steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from 77c2d47 to ba49cb7 Compare February 1, 2021 14:55
@steveloughran steveloughran marked this pull request as draft February 1, 2021 15:29
@steveloughran steveloughran requested a review from bgaborg February 1, 2021 15:29
@steveloughran
Copy link
Contributor Author

  • Not Tested*

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 34s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 37s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 44s trunk passed
+1 💚 shadedclient 14m 15s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 25s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+0 🆗 spotbugs 1m 7s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 5s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 0m 35s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 35s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 javac 0m 30s the patch passed
-0 ⚠️ checkstyle 0m 22s /diff-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 37 new + 42 unchanged - 0 fixed = 79 total (was 42)
+1 💚 mvnsite 0m 33s the patch passed
-1 ❌ whitespace 0m 0s /whitespace-eol.txt The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 shadedclient 12m 38s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
-1 ❌ javadoc 0m 27s /diff-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu120.04-b01 with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu120.04-b01 generated 7 new + 88 unchanged - 0 fixed = 95 total (was 88)
-1 ❌ findbugs 1m 8s /new-findbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
-1 ❌ unit 1m 58s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
73m 42s
Reason Tests
FindBugs module:hadoop-tools/hadoop-aws
Inconsistent synchronization of org.apache.hadoop.fs.s3a.S3AFileSystem.signerManager; locked 66% of time Unsynchronized access at S3AFileSystem.java:66% of time Unsynchronized access at S3AFileSystem.java:[line 4864]
Failed junit tests hadoop.fs.s3a.TestS3ABlockOutputStream
hadoop.fs.s3a.s3guard.TestObjectChangeDetectionAttributes
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1839/1/artifact/out/Dockerfile
GITHUB PR #1839
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 562af3dc1838 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 115623a
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1839/1/testReport/
Max. process+thread count 666 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1839/1/console
versions git=2.25.1 maven=3.6.3 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

First PoC of My planned layout model of the S3A FS.

* There's a raw layer and a guarded layer
* which are instantiated in sequence in a separate executor from S3AFileSystem.initalize

And whose accessors block until completed or rethrow failures.

The layers are being handed in all their dependencies from FS.initialize()
and we currently block until started.

What I plan to do, in a future iteration, is:

* each layer extracts their own settings from the config and stores locally
  (list version, upload size etc)
* have each layer instantiate their internal classes (AWS S3 client, transfer manager) internally
* Also async create: metastore, DT binding
* And all startup actions (check bucket, init multipart, ...)

Then
* move ops to the layers, raw* -> rawStore; inner -> S3AStore
* move WriteOperationHelper, SelectBinding, etc, to all work against S3AStore rather than FS.

S3AStore will become where most of the code moves to; S3AFilesystem more of the init and binding to hadoop FS API.
RawS3A will be the accessor through which all AWS client access goes.

Not going to change: all accessors on S3AFileSystem...not just tests use it but some external code (cloudstore) needs it to get at low level S3A, etc.

Change-Id: I998c0d61cce2ee7fd0be804bf21da6b68fd69a6f

HADOOP-16583 refactoring RequestFactory and RawS3A

This moves most of the s3 client interaction into RawS3AImpl,
Mainly just by moving the methods from S3AFileSystem.

One key finding was that we can put all the code to create
Request classes for the AWS SDK into its own factory,
So ensure everything is set up consistently and keeping what
is mainly housekeeping out of the way of everything else.

We can do that rework immediately, as it doesn't require
the rest of the layering.

Also, the rawS3A() accessor no longer raises IOEs, it was
Used into many places where that wasn't allowed.
I sort of expected that.
There's an awaitQuietly() (todo: change name) method
which returns all failures as RTEs rather than IOEs.

For the S3A FS API entry points, we still need to raise
the normal startup exceptions, so we will need extra block/validate there
after which, maybe, we can change the logic in these new getters just to
raise an exception if their refs are null, rather than block.

Change-Id: I3d15a46dd3034fc5d34dbaf01aaddba462d63a9d

HADOOP-16583. refactoring -fix regression

I'd pulled an import which wasn't used, but a recent change reinstated its
need

Change-Id: I12b0725a14a6e7bae1ecf93fcbd4d05e1eacb03e

HADOOP-16848. Building against trunk

Change-Id: Id1af0763998dd285bb2476298cdb94edf809b67c
RequestFactory adds factory construction and full set of operations.
Moves all S3 IO into the RawS3A class, takes factory
S3 select API call goes into the store
remove attempt at async init. Too complex.

Change-Id: I36bb2884ebe0d7183f99f25860f79e3796a112fb
@steveloughran steveloughran force-pushed the s3/HADOOP-16848-refactoring-initial-layers branch from ba49cb7 to 15f9adc Compare April 26, 2021 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs/s3 changes related to hadoop-aws; submitter must declare test endpoint work in progress PRs still Work in Progress; reviews not expected but still welcome
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants