Open
Description
Environment
Delta-rs version: deltalake==0.18.1
Binding: Python
Environment:
- Cloud provider: AWS S3
- OS: Linux
- Other:
- Python 3.10.4
pyarrow==16.1.0
pyarrow-hotfix==0.6
Bug
What happened:
Trying to do a simple table loading from S3, but kept getting this OSError: Generic S3 error: error decoding response body
table = DeltaTable(table_uri, storage_options=storage_options)
print(f"version: {table.version()}")
print(f"schema: {table.schema()}")
print(table.files())
ts = time.time()
df = table.to_pyarrow_table()
version: 0
schema: Schema([Field(id, PrimitiveType("string"), nullable=True), Field(path, PrimitiveType("string"), nullable=True)])
['0-e03dac34-16a0-4b6e-82c8-fd1098d1bf45-0.parquet']
Traceback (most recent call last):
File "test.py", line 32, in <module>
df = table.to_pyarrow_table()
File "***/lib/python3.10/site-packages/deltalake/table.py", line 1161, in to_pyarrow_table
return self.to_pyarrow_dataset(
File "pyarrow/_dataset.pyx", line 562, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 3804, in pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 88, in pyarrow.lib.check_status
OSError: Generic S3 error: error decoding response body
Stack shows that this is actually in pyarrow
. Not sure if it possible to tweak pyarrow
's behavior with S3 from deltalake
.
What you expected to happen:
I can get the pyarrow table.
How to reproduce it:
More details:
I have verified the integrity of this table with these methods:
- Cloning the table locally, then load from there.
to_pyarrow_table()
runs fine. - Reading the S3 table with
duckdb
(and itsdelta
extension). Worked fine, too.