Skip to content

The delta sharing server tries to connect to AWS eventhough an endpoint URL to an ECS S3 (Dell) Object Storage ist passed #753

@NAF-IPT

Description

@NAF-IPT

When trying to access the delta tables on our ECS S3 delta storage via the delta sharing server we get the following error message:

Caused by: java.util.concurrent.ExecutionException: java.io.InterruptedIOException: doesBucketExist on daepi: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to daepi.s3.amazonaws.com:443 [daepi.s3.amazonaws.com/52.217.110.180, daepi.s3.amazonaws.com/3.5.25.166, daepi.s3.amazonaws.com/52.217.204.193, daepi.s3.amazonaws.com/3.5.17.37, daepi.s3.amazonaws.com/52.217.167.145, daepi.s3.amazonaws.com/52.216.207.67, daepi.s3.amazonaws.com/3.5.9.128, daepi.s3.amazonaws.com/16.15.185.168] failed: connect timed out

We pass the following content in the sharing-config.yaml:

`
/# The format version of this config file
version: 1
/# Config shares/schemas/tables to share
shares:

  • name: "daepi"
    schemas:
    • name: "default"
      tables:

      • name: "demo_delta_table"
        location: "s3a://daepi/demo_delta_table"
        id: "00000-00000-0000-000000000000"
        storage:
        type: s3a
        properties:
        fs.s3a.access.key: "Access_key"
        fs.s3a.secret.key: "Secret_key"
        fs.s3a.endpoint: "ECS_Endpoint_URL"
        fs.s3a.path.style.access: "true"
        fs.s3a.connection.ssl.enabled: "false"
        fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"

      spark.hadoop.fs.s3a.access.key: "Access_key"
      spark.hadoop.fs.s3a.secret.key: "Secret_key"
      spark.hadoop.fs.s3a.endpoint: "ECS_Endpoint_URL"
      spark.hadoop.fs.s3a.path.style.access: "true"
      spark.hadoop.fs.s3a.connection.ssl.enabled: "false"
      spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
      spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
      spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
      spark.delta.sharing.network.debugLogging: "true"

host: "localhost"
port: 9999
endpoint: "/delta-sharing"
preSignedUrlTimeoutSeconds: 3600

deltaTableCacheSize: 10

stalenessAcceptable: false

evaluatePredicateHints: false
`

To run the docker image and start the delta server we use the following command:

docker run -p 9999:9999 -e AWS_ACCESS_KEY_ID=Access_key -e AWS_SECRET_ACCESS_KEY=Secret_key -e AWS_ENDPOINT_URL=ECS_Endpoint_URL -e AWS_ALLOW_HTTP=true --mount type=bind,source=./Coding/delta_sharing/sharing-config.yaml,target=/config/delta-sharing-server-config.yaml deltaio/delta-sharing-server:1.3.3 --config /config/delta-sharing-server-config.yaml

We then test the access with the following pyhton file, where the first part with listing the shares works but then the second part throws the error above on the delta server docker container:

`
import delta_sharing
import pandas as pd
import os

/# Load the sharing profile
profile_file = "delta-sharing-profile.json"
/# Create a client
client = delta_sharing.SharingClient(profile_file)

/# List shares
shares = client.list_shares()
print("Shares:", shares)

table_url = profile_file + "#daepi.default.demo_delta_table"
df = delta_sharing.load_as_pandas(table_url)
/# Display the DataFrame
print(df)
`

The delta-sharing-profile.json looks like this:

{ "shareCredentialsVersion": 1, "bearerToken": "", "endpoint": "http://localhost:9999/delta-sharing/" }

We also tested that we can connect from the delta sharing docker container to the ECS S3 so we do not have a Network issue.
Also we used the same setup but instead of using the ECS S3 we tested it with an AWS S3. In this case everything worked as expected.

From the error and all our trials it is now clear that the delta sharing server does not read the Endpoint URL correctly for the ECS S3 that we provide. Hence it tries to connect to AWS instead of our given Endpoint URL, which obviously doesn't work with the wrong credentials and and wrong delta_table etc.
Would it be possible to fix this issue that we can also connect to our ECS S3 via the delta server?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions