-
-
Notifications
You must be signed in to change notification settings - Fork 294
Add documentation and s3proxy testing for ROS3 VFD #5560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Add documentation and s3proxy testing for ROS3 VFD #5560
Conversation
Adds workflow to build ROS3 VFD and optionally build aws-c-s3 library from source or use package managers Adds testing of ROS3 VFD with s3proxy and docker Adds new H5Pset_fapl_ros3_endpoint()/H5Pget_fapl_ros3_endpoint() API functions to set/get an alternative endpoint URL to use when opening files with the ROS3 VFD Cleans up warnings in tools and tests related to ROS3 VFD structure size Co-authored-by: Larry Knox <[email protected]> Co-authored-by: Allen Byrne <[email protected]>
This PR is the followup to #5548 and replaces and largely copies the remaining features needed from #5397, but with several changes and additions made to work with the aws-c-s3 library. The ROS3 is built on the 3 major platforms for PRs, but only tested on Ubuntu as testing currently relies on docker and the aws cli, which are preinstalled on the Ubuntu runners. It should be fairly straightforward to get testing running on Windows and Mac as well, but this is a good start. For PRs, aws-c-s3 is installed from vcpkg on Windows and homebrew on Mac, but Ubuntu doesn't have a package yet, so it gets built from source. The VFD is also built and tested on Ubuntu with a source build of aws-c-s3 in the VFD daily build action. It's fairly large due to all the changes that had to be made to work well with #5397, but if need be I can try to break it down into more pieces. The large line change count is just due to the inclusion of the large text file for testing, which I hope we can remove in the future. |
release_docs/INSTALL_S3.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -669,6 +681,14 @@ H5FD__s3comms_s3r_open(const char *url, const H5FD_ros3_fapl_t *fa, const char * | |||
aws_error_str(aws_last_error())); | |||
cond_var_init = true; | |||
|
|||
/* Check if path-style requests should be forced */ | |||
pathstyle_env = getenv(HDF5_ROS3_VFD_FORCE_PATH_STYLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for s3proxy testing, as I couldn't get virtual-hosted-style requests to work correctly with it, but path-style requests work just fine.
@@ -1041,6 +1041,50 @@ | |||
* additional variable settings associated with the Split driver, there is no H5Pget_fapl_split | |||
* function. | |||
* | |||
* \subsubsection subsubsec_file_alternate_drivers_ros3 The ROS3 Driver | |||
* The ROS3 (read-only S3) driver is used to enable read-only access to HDF5 files which are stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a user's guide entry for the VFD, but as the entries for the other VFDs were brief I didn't add too many details here. Most information about the VFD is in its header file currently. But I think we should revise these entries into separate pages or something similar so they can serve as the main location for information about a particular VFD. That or maybe free-form doxygen like this should be put into each VFD's header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The H5?module.h files were convenient (initially) for User Guide entries as they wouldn't be confused/interfere with reference Manuals entries and some UG entries were extensive. Anything should be able to be reworked/relocated as necessary.
test/testfiles/t8.shakespeare.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now these text files have been included as the ros3 and s3comms tests use them. But the ROS3 VFD can't actually read from text files normally, it can only do so indirectly through H5FDopen()
and H5FDread()
, which is what those tests are testing. We should revise those tests in the future to do more meaningful testing.
tools/libtest/h5tools_test_utils.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file mostly just cleaned up stack size warnings from the H5FD_ros3_fapl_ext_t
structure being too large.
}, | ||
"", /* Session/security token */ | ||
}; | ||
static H5FD_ros3_fapl_ext_t *ros3_fa_g = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I revised h5dump
, h5ls
and h5stat
to allocate these structures to fix stack size warnings.
@@ -891,18 +872,6 @@ parse_command_line(int argc, const char *const *argv) | |||
vfd_info_g.type = VFD_BY_NAME; | |||
vfd_info_g.u.name = H5_optarg; | |||
use_custom_vfd_g = true; | |||
|
|||
#ifdef H5_HAVE_ROS3_VFD | |||
if (0 == strcmp(vfd_info_g.u.name, drivernames[ROS3_VFD_IDX])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic like this was moved down in h5dump
, h5ls
and h5stat
to handle overlap between options like -f ros3
and --vfd-name ros3
where the former would set the pointer to the VFD info and the latter wouldn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they really different options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of the tools code they're pretty much all exactly the same now among tools, with the "old" options being replaced by the --vfd-name
option. The latter was kind of meant for vfd plug-ins, but it evolved to cover builtin library vfds as well, so they really should work the same at this point.
@@ -2802,7 +2808,7 @@ main(int argc, char *argv[]) | |||
data_g = true; | |||
} | |||
else if (!strcmp(argv[argno], "--enable-error-stack")) { | |||
enable_error_stack = 1; | |||
enable_error_stack = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@byrnHDF Since h5ls
doesn't accept an argument for --enable-error-stack
, I just changed the default here to 2 so that file open errors aren't hidden since they're often the most interesting errors. At this point, I kind of think 2 should be the default everywhere so that --enable-error-stack
works and you don't have to specify --enable-error-stack=2
.
description: "Install aws-c-s3 from a package manager ('package') or build from source ('source')" | ||
required: true | ||
type: string | ||
aws_c_s3_tag: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should add a default value to optional input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually checking for empty input and letting the checkout action use the default branch if so. Probably a bit more future proof than specifying a default (though let's hope the name of the default branch doesn't change in GH again).
find_program (AWS_CLI_EXECUTABLE aws) | ||
if (NOT AWS_CLI_EXECUTABLE) | ||
set (HDF5_ENABLE_ROS3_VFD_DOCKER_PROXY OFF CACHE BOOL "Use docker for ROS3 VFD S3proxy testing" FORCE) | ||
message (WARNING "AWS cli is required for ROS3 VFD S3proxy testing, but could not be found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a FATAL_ERROR if it is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did the same as the existing logic for the docker option which just turns the option back off. We could make both fatal errors instead though if we want. I just added this because I noticed the test logic expects the aws cli commands to be available as well, but wasn't checking for them. For me it caused a failure with no obvious reason why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original may have been wrong, but the more important decision is what is correct? If the user asked for it and it can't be found, will the build/test still give the intended result or should we stop and force the user to fix it. Given the time for some tests to complete (especially on Windows) I would prefer to know immediately and not waste time.
@@ -573,6 +587,18 @@ New Features | |||
|
|||
Tools: | |||
------ | |||
- Added AWS endpoint command option to allow specifying an alternate endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this mention the ENV variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add a line for that.
use_virtual_style = false; | ||
|
||
/* | ||
* If an alternate endpoint URL was specified, use a path-style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed this comment in several places - why is this a problem, because I thought the aws library would deal with this? Is it a hdf design decision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about what a server will accept, not what the aws library can do since it doesnt care too much about that and cant necessarily tell ahead of time what's supported. The only request format I could get to work between the library and s3proxy was the path-style format, which is why this is needed. Virtual-hosted-style requests seem somewhat specific to s3 (though I haven't checked other cloud providers) and since the initial use case for the alternate endpoint is for testing with s3proxy, this seemed reasonable to do. https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html eventually support for path-style requests will be removed from S3, but that's a future problem. I expect we'll still have to maintain support for path-style requests into the future for compatibility.
manager for the name of this package if it's available. HDF5 installs | ||
this library in the following ways in CI testing: | ||
|
||
Linux (Ubuntu): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On fedora 42 I had to install the "devel" package as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I need to update this line since Ubuntu didn't have a package yet. I don't know how much we want to get into the details for each Linux distribution since it could vary widely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for linux there always seems to be a devel package and it is needed for CMake to configure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H5FDros3_s3comms.c:2202:74: warning: unused parameter ‘aws_params’ [-Wunused-parameter]
2202 | H5FD__s3comms_format_user_agent_header(const H5FD__s3comms_aws_params_t *aws_params,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix. I turned on the HDF5_ENABLE_WARNINGS_AS_ERRORS=ON
to try and catch new warnings in the ros3 code, but there appear to be several warning types it doesnt treat as errors, which we should revisit.
|
||
if (*value) | ||
if (NULL == (*value = strdup(*value))) | ||
HGOTO_ERROR(H5E_RESOURCE, H5E_CANTALLOC, FAIL, "can't copy string property token"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the failure message refer to endpoint instead of property token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think this implementation was copy pasted from the previous implementation for tokens. Will tidy up the error messages.
* populated. If | ||
* Only secret_key is allowed to be empty (the empty string, ""). | ||
* If all three (or four) strings are empty (""), the default fapl will be | ||
* default. Both aws_region and secret_id values must be both empty or both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Both" at the beginning of the sentence seems redundant; not terribly important, though.
Adds testing of ROS3 VFD with s3proxy and docker
Adds workflow to build ROS3 VFD and optionally build aws-c-s3 library from source or use package
managers where possible
Adds new H5Pset_fapl_ros3_endpoint()/H5Pget_fapl_ros3_endpoint() API functions to set/get an
alternative endpoint URL to use when opening files with the ROS3 VFD
Cleans up warnings in tools and tests related to ROS3 VFD structure size
Fixes #5109