Skip to content

to_pyarrow_table() on a table in S3 kept getting "Generic S3 error: error decoding response body" #2595

Open
@k-ye

Description

@k-ye

Environment

Delta-rs version: deltalake==0.18.1

Binding: Python

Environment:

  • Cloud provider: AWS S3
  • OS: Linux
  • Other:
    • Python 3.10.4
    • pyarrow==16.1.0
    • pyarrow-hotfix==0.6

Bug

What happened:

Trying to do a simple table loading from S3, but kept getting this OSError: Generic S3 error: error decoding response body

table = DeltaTable(table_uri, storage_options=storage_options)
print(f"version: {table.version()}")
print(f"schema: {table.schema()}")
print(table.files())

ts = time.time()
df = table.to_pyarrow_table()
version: 0
schema: Schema([Field(id, PrimitiveType("string"), nullable=True), Field(path, PrimitiveType("string"), nullable=True)])
['0-e03dac34-16a0-4b6e-82c8-fd1098d1bf45-0.parquet']
Traceback (most recent call last):
  File "test.py", line 32, in <module>
    df = table.to_pyarrow_table()
  File "***/lib/python3.10/site-packages/deltalake/table.py", line 1161, in to_pyarrow_table
    return self.to_pyarrow_dataset(
  File "pyarrow/_dataset.pyx", line 562, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 3804, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 88, in pyarrow.lib.check_status
OSError: Generic S3 error: error decoding response body

Stack shows that this is actually in pyarrow. Not sure if it possible to tweak pyarrow's behavior with S3 from deltalake.

What you expected to happen:

I can get the pyarrow table.

How to reproduce it:

More details:

I have verified the integrity of this table with these methods:

  1. Cloning the table locally, then load from there. to_pyarrow_table() runs fine.
  2. Reading the S3 table with duckdb (and its delta extension). Worked fine, too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions