Skip to content

Iceberg using JDBC catalog does not correctly cannot use AWS assume role to access s3 #24571

@BaudoinWR

Description

@BaudoinWR

Describe the bug

When setting the s3.iam_role_arn option (introduced in #23775) to create an Iceberg source with a JDBC catalog, the initial java loadTable call does not use the assume role functionality.
This causes failure if the s3 bucket in which table metadata is stored is meant to be access using an assume role which is the expectation when the s3.iam_role_arn property is used.

Error message/log

software.amazon.awssdk.services.s3.model.S3Exception: User: <arn of the Risingwave Server> is not authorized to perform: s3:GetObject on resource: "<s3_bucketName>/<metadata_json_location>" because no resource-based policy allows the s3:GetObject action (Service: S3, Status Code: 403, Request ID: xxx, Extended Request ID: xxx) 

software.amazon.awssdk.services.s3.model.S3Exception$BuilderImpl.build(S3Exception.java:113) at
software.amazon.awssdk.services.s3.model.S3Exception$BuilderImpl.build(S3Exception.java:61) at
software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyDisallowedRetryException(RetryableStageHelper.java:168) at
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:73) at
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53) at
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35) at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82) at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62) at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43) at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50) at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32) at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) at
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) at
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210) at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173) at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$0(BaseSyncClientHandler.java:66) at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182) at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:60) at
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:52) at
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:60) at
software.amazon.awssdk.services.s3.DefaultS3Client.getObject(DefaultS3Client.java:6416) at
org.apache.iceberg.aws.s3.S3InputStream.openStream(S3InputStream.java:240) at
org.apache.iceberg.aws.s3.S3InputStream.openStream(S3InputStream.java:225) at
org.apache.iceberg.aws.s3.S3InputStream.positionStream(S3InputStream.java:221) at
org.apache.iceberg.aws.s3.S3InputStream.read(S3InputStream.java:143) at
com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:547) at
com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:137) at
com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:266) at
com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1874) at
com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1273) at
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3924) at
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:291) at
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:284) at
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:180) at
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:199) at
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) at
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) at
org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) at
org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:199) at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176) at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:167) at
org.apache.iceberg.jdbc.JdbcTableOperations.doRefresh(JdbcTableOperations.java:100) at
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:88) at
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:71) at
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:49) at
org.apache.iceberg.rest.CatalogHandlers.loadTable(CatalogHandlers.java:328) at
com.risingwave.connector.catalog.JniCatalogWrapper.loadTable(JniCatalogWrapper.java:53

To Reproduce

The setup requires having a postgres database setup with Iceberg metadata schemas.

CREATE SOURCE test_bwr_3 WITH (                        
  connector = 'iceberg',
  catalog.type = 'jdbc',        
  catalog.uri = 'jdbc:postgresql://localhost:5432/postgres',
  catalog.jdbc.user = 'user',                     
  catalog.jdbc.password = 'password',     
  warehouse.path = 's3://my_bucket',
  s3.region = 'us-east-1',
  s3.endpoint = 'https://s3.amazonaws.com',
  s3.iam_role_arn = 'arn:aws:iam::xxx:role/my_role',
  table.name = 'table',
  catalog.name = 'catalog',
  database.name = 'database',
  enable_config_load = 'true'
);

Expected behavior

The expected behavior is, when reading metadata configured in the JDBC catalog and stored in the s3 bucket, the java process should use the configured assumed role.

This is not the case currently, resulting in the process failing with the following stack trace where the mention of user is the indicator that the process has not attempted to assume the expected role before trying to retrieve the s3 file.

How did you deploy RisingWave?

Risingwave is deployed on a kubernetes using the official docker image v2.7.2 with vanilla configurations

The version of RisingWave

PostgreSQL 13.14.0-RisingWave-2.7.2 (30301dc965a6f30c08de859e2be0e6cb1b66f6b0)

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugType: Bug. Only for issues.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions