Skip to content

Support S3 object URIs in ROS3 #4600

Open
@ajelenak

Description

@ajelenak

Is your feature request related to a problem? Please describe.

S3 objects are more commonly referenced as s3://bucket_name/object_key. ROS3 virtual file driver uses S3 object URLs instead in the form: https://s3.<aws_region>.amazonaws.com/<bucket_name>/<object_key>. Constructing an object's URL requires extracting bucket name and object key from its URI. AWS region comes from the usual sources (config file or env. variable) unless specified by the user application. Bucket names are up to 63 bytes long, while object keys can be up to 1024 bytes long.

Describe the solution you'd like

Be able to use S3 object URIs when working with HDF5 files in S3 cloud stores. Example:

h5ls -r --vfd=ros3 s3://my-bucket/a/b/c/file.h5.

Describe alternatives you've considered

Write code to parse S3 object URI and assemble object's URL prior to invoking libhdf5 or its tools. This is what h5py currently does.

Additional context

Below is how ChatGPT would extract bucket and key names.

#include <stdio.h>
#include <string.h>

// Function to parse S3 URI
void parseS3URI(const char *uri, char *bucket_name, char *object_key) {
    // Check if the URI starts with "s3://"
    if (strncmp(uri, "s3://", 5) != 0) {
        printf("Invalid S3 URI format. It should start with 's3://'\n");
        return;
    }

    // Skip "s3://"
    const char *path_start = uri + 5;

    // Find the '/' delimiter to separate bucket name and object key
    const char *slash_pos = strchr(path_start, '/');
    if (slash_pos == NULL) {
        printf("Invalid S3 URI format. Missing '/' after bucket name\n");
        return;
    }

    // Calculate lengths
    size_t bucket_len = slash_pos - path_start;
    size_t key_len = strlen(slash_pos + 1);

    // Copy bucket name
    strncpy(bucket_name, path_start, bucket_len);
    bucket_name[bucket_len] = '\0';

    // Copy object key
    strncpy(object_key, slash_pos + 1, key_len);
    object_key[key_len] = '\0';
}

int main() {
    // Example S3 URI
    const char *s3_uri = "s3://my-test-bucket/my-folder/my-file.h5";

    // Buffer to store parsed components
    char bucket_name[64];
    char object_key[1025];

    // Parse the S3 URI
    parseS3URI(s3_uri, bucket_name, object_key);

    // Print the parsed components
    printf("Bucket Name: %s\n", bucket_name);
    printf("Object Key: %s\n", object_key);

    return 0;
}

Metadata

Metadata

Assignees

Labels

Component - ToolsCommand-line tools like h5dump, includes high-level toolsHDFG-internalInternally coded for use by the HDF GroupPriority - 0. BlockerThis MUST be merged for the release to happen

Projects

Status

In progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions