-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19236. Integration of Volcano Engine TOS in Hadoop. #7194
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
3538027
to
f15b69d
Compare
💔 -1 overall
This message was automatically generated. |
import static org.apache.hadoop.fs.XAttrSetFlag.CREATE; | ||
import static org.apache.hadoop.fs.XAttrSetFlag.REPLACE; | ||
|
||
public class RawFileSystem extends FileSystem { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the RawFileSystem
, maybe we can just name it as the TosFileSystem
directly, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hadoop-tos module is inherited from VolcanoEngine EMR's FileSystem connector project. We should keep the classes the same as much as possible, so the new features in the commercial version could be easily transplanted to hadoop-tos.
import org.apache.hadoop.fs.Path; | ||
import org.apache.hadoop.fs.tosfs.object.Constants; | ||
|
||
public class RawFileStatus extends FileStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similiar to the following comment, maybe we can use TosFileStatus
directly ?
|
||
package org.apache.hadoop.fs.tosfs.conf; | ||
|
||
public class ArgumentKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArgumentKey ? It's a bit unclear.. ?
import java.util.stream.Collectors; | ||
import java.util.stream.Stream; | ||
|
||
public class FileStore implements ObjectStorage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this FileStore
(which was designed for testing the abstracted ObjectStorage
) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll recommend to keep the FileStore, so all the unit tests could run independently from TOS. Currently the hadoop-tos module still depends on TOS to run most test cases. But in the future, my plan is to switch them to both FileStore and TOS, then we can test without TOS.
f15b69d
to
b68e862
Compare
💔 -1 overall
This message was automatically generated. |
The 'apply patch to trunk' error is caused by downloading a bad url. It should download 'https://github.com/apache/hadoop/pull/7194.patch' as the input.patch, but actual download is '#7194', which is the html content of this page. I run the I'm still trying to figure out the cause. Does anybody know the reason? Thanks for any clues. |
I found the cause, this patch is too large(24741 lines), exceeding github api's maximum number of lines (20000). It's convenient for reviewers to have an overview of the whole module, so I'll keep this pr for review. |
…ntation. Contributed by: ZhengHu, SunXin, XianyinXin, Rascal Wu, FangBo, Yuanzhihuan.
b68e862
to
91867ed
Compare
Description of PR
Volcano Engine is a fast growing cloud vendor launched by ByteDance, and TOS is the object storage service of Volcano Engine. A common way is to store data into TOS and run Hadoop/Spark/Flink applications to access TOS. But there is no original support for TOS in hadoop, thus it is not easy for users to build their Big Data System based on TOS.
This work aims to integrate TOS with Hadoop to help users run their applications on TOS. Users only need to do some simple configuration, then their applications can read/write TOS without any code change. This work is similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object Storage in Hadoop.
Please see the issue for more details. https://issues.apache.org/jira/browse/HADOOP-19236
How was this patch tested?
Unit tests need to connect to tos service. Setting the 6 environment variables below to run unit tests.
Then cd to hadoop project root directory, and run the test command below.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?