-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Motivation
When accessing Iceberg tables stored in S3 buckets owned by a different AWS account, some organizations enforce that cross-account access must go through S3 Access Points rather than allowing direct bucket access. This is a common security policy that provides better access control and auditability for shared data.
Currently, Trino's native S3 filesystem (trino-filesystem-s3) has no way to configure access point mappings. When Trino resolves an S3 path like s3://my-bucket/path/to/data, it passes the bucket name directly to the AWS SDK without any rewriting. This makes it impossible to access cross-account tables in environments where S3 Access Points are required.
This is particularly relevant for modern table formats (Iceberg, Delta Lake, and Hudi) because their metadata stores the explicit bucket name in file paths. Clients then attempt to read from that bucket directly, which fails when the bucket policy only permits access through an access point.
The following diagram illustrates the problem in an AWS environment (shown here with Glue Catalog + Spark and Athena + Trino, but the same issue applies to any S3-backed query engine):
- Setup: Account A needs to read/write an Iceberg table located in Account B's S3 bucket. Account B provides a cross-account S3 Access Point and alias for this purpose.
- Spark: Can be configured to map the bucket name to the access point alias. S3 paths are rewritten transparently, and queries succeed.
- Trino: Has no equivalent configuration. It reads the raw bucket name from table metadata and issues S3 requests against it directly, which fails when the bucket policy only allows access via access points.
Proposed solution
Add configuration properties to map S3 bucket names to Access Point aliases (or ARNs). When Trino encounters an S3 path, it rewrites the bucket name to the configured access point before making the request to S3.
Proposed configuration format
In the catalog properties file (e.g., iceberg.properties):
s3.access-point.my-bucket-name=my-access-point-alias-s3alias
s3.access-point.another-bucket=arn:aws:s3:us-east-1:123456789012:accesspoint/my-access-pointThe pattern is s3.access-point.<bucket-name>=<access-point-alias-or-arn>.
When Trino builds an S3 request for s3://my-bucket-name/some/key, it would substitute my-bucket-name with my-access-point-alias-s3alias in the SDK request, so S3 routes the request through the access point.
Where changes would be needed
Based on the current codebase, the implementation would touch:
S3FileSystemConfig— Add aMap<String, String>property for bucket-to-access-point mappingsS3Locationor a new translation layer — Rewritebucket()to the configured access point alias before it reaches the AWS SDK request builders inS3FileSystem,S3InputFile,S3OutputFile, etc.S3SecurityMapping(optionally) — Extend the security mapping JSON to also support per-mapping access point configuration
Example usage
# catalog/iceberg.properties
connector.name=iceberg
iceberg.catalog.type=glue
fs.native-s3.enabled=true
s3.region=us-east-1
# Cross-account bucket accessed via access point
s3.access-point.shared-data-bucket=shared-data-ap-s3aliasPrior art
Apache Spark supports this via fs.s3a.bucket.<bucket>.accesspoint.arn configuration (Hadoop S3A docs)
Willingness to contribute
I would be happy to work on a PR for this. I'm relatively new to Java and to the Trino codebase, so it may take me some time to get the implementation right. Any guidance from maintainers on the preferred approach would be appreciated.
