Skip to content

DynamoDB Locking Mechanism Failing for AWS S3 Storage Backend in Version 0.20.1 #2930

@donotpush

Description

@donotpush

Bug Report

Environment

Delta-rs version: 0.20.1
Environment: Docker


Description

Issue: Fails to write data to AWS S3 using DynamoDB locking mechanism in version 0.20.1, but works in version 0.19.2.


Error Messages

  1. First Execution Failure (table does not exists):

    Traceback (most recent call last):
      File "/app/test.py", line 21, in <module>
          df.write_delta(
      File "/usr/local/lib/python3.11/site-packages/polars/dataframe/frame.py", line 4286, in write_delta
          write_deltalake(
      File "/usr/local/lib/python3.11/site-packages/deltalake/writer.py", line 323, in write_deltalake
          write_deltalake_rust(
    _internal.CommitFailedError: Transaction failed: dynamodb client failed to write log entry
    
  2. Subsequent Execution Failure (after it worked once, table already exists):

    Traceback (most recent call last):
      File "/app/test.py", line 22, in <module>
          df.write_delta(
      File "/usr/local/lib/python3.11/site-packages/polars/dataframe/frame.py", line 4286, in write_delta
          write_deltalake(
      File "/usr/local/lib/python3.11/site-packages/deltalake/writer.py", line 302, in write_deltalake
          table.update_incremental()
      File "/usr/local/lib/python3.11/site-packages/deltalake/table.py", line 1258, in update_incremental
          self._table.update_incremental()
    _internal.DeltaError: Generic error: error in DynamoDb
    

How to Reproduce

Dockerfile:

FROM python:3.11

WORKDIR /app

RUN pip install deltalake==0.20.1 polars

# Uncomment to see it working
# RUN pip install deltalake==0.19.2

COPY test.py .

CMD [ "python", "test.py" ]

test.py:

import polars
import os

df = polars.DataFrame({'x': [1, 2, 3]})

storage_options = {
    'AWS_S3_LOCKING_PROVIDER': 'dynamodb',
    'DELTA_DYNAMO_TABLE_NAME': 'delta_log',
    'AWS_ACCESS_KEY_ID': os.environ["AWS_ACCESS_KEY_ID"],
    'AWS_SECRET_ACCESS_KEY': os.environ["AWS_SECRET_ACCESS_KEY"],
    'AWS_REGION': os.environ['AWS_REGION'],
}

df.write_delta(
    f"s3://{os.environ['BUCKET_NAME']}/delta/test",
    storage_options=storage_options,
)

# You will need a bucket and a DynamoDB table.
# How to create DynamoDB table?
    #  aws dynamodb create-table \
    # --table-name delta_log \
    # --attribute-definitions AttributeName=tablePath,AttributeType=S AttributeName=fileName,AttributeType=S \
    # --key-schema AttributeName=tablePath,KeyType=HASH AttributeName=fileName,KeyType=RANGE \
    # --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

Run the following commands:

docker build -t test:latest .
docker run \
  -e AWS_ACCESS_KEY_ID=your_access_key \
  -e AWS_SECRET_ACCESS_KEY=your_secret_key \
  -e BUCKET_NAME=your_bucket_name \
  -e AWS_REGION=your_region \
  test:latest

If you uncomment line 8 in the Dockerfile and then execute docker build and docker run again, you will see that it works correctly with version 0.19.2

Reference: https://delta-io.github.io/delta-rs/integrations/object-storage/s3/

Metadata

Metadata

Assignees

Labels

binding/rustIssues for the Rust cratebugSomething isn't workingstorage/awsAWS S3 storage related

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions