Skip to content

Add AWS Credentials parsing from file #2117

@Shershebnev

Description

@Shershebnev

Environment

Delta-rs version:

$ pip show deltalake
Name: deltalake
Version: 0.15.1
Summary: Native Delta Lake Python binding based on delta-rs with Pandas integration
Home-page: https://github.com/delta-io/delta-rs
Author: Qingping Hou <[email protected]>, Will Jones <[email protected]>
Author-email: Qingping Hou <[email protected]>, Will Jones <[email protected]>
License: Apache-2.0
Location: ...
Requires: pyarrow, pyarrow-hotfix
Required-by: 

Binding:
Python
Environment:

  • Cloud provider: AWS (Ubuntu)
  • OS: MacOS
  • Other:

Bug

What happened:
It seems that credentials are not correctly obtained from ~/.aws/credentials and ~/.aws/config files. Just like here #1416 I'm getting OSError: Generic S3 error: Missing region when trying to read from S3

On MacOS locally:
Setting AWS_DEFAULT_REGION fixes this, but then it tries to retrieve instance metadata using http://169.254.169.254/latest/api/token which obviously fails when running not from AWS instance OSError: Generic S3 error: Error after 10 retries in 6.409805791s, max_retries:10, retry_timeout:180s, source:error sending request for url (http://169.254.169.254/latest/api/token): error trying to connect: tcp connect error: Host is down (os error 64)

On AWS instance:
Setting only AWS_DEFAULT_REGION results in OSError: Generic S3 error: Client error with status 403 Forbidden: <?xml version="1.0" encoding="UTF-8"?>

In both cases setting everything through env variables fixes the problem, e.g. AWS_DEFAULT_REGION=... AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... python. Other tools like boto3 don't have problems using credentials stored in default location:

$ python3.9
Python 3.9.16 (main, Aug  3 2023, 01:00:02) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import polars as pl
>>> master_table_df = pl.scan_delta("s3://REDACTED.delta").select("audio_type", "parallel_id").collect()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/.local/lib/python3.9/site-packages/polars/io/delta.py", line 263, in scan_delta
    dl_tbl = _get_delta_lake_table(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/polars/io/delta.py", line 306, in _get_delta_lake_table
    dl_tbl = deltalake.DeltaTable(
  File "/home/ubuntu/.local/lib/python3.9/site-packages/deltalake/table.py", line 396, in __init__
    self._table = RawDeltaTable(
OSError: Generic S3 error: Missing region
>>> import boto3
>>> ecr = boto3.client("ecr")
>>> 

What you expected to happen:
Credentials are properly read from default location ~/.aws/credentials and ~/.aws/config
How to reproduce it:
Install deltalake and try to read from S3 while having credentials set in default files. See example above with polars and deltalake

Metadata

Metadata

Assignees

No one assigned

    Labels

    binding/rustIssues for the Rust crateenhancementNew feature or requeststorage/awsAWS S3 storage related

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions