Open
Description
Environment
Delta-rs version:
0.19.1
Binding:
Python and Rust
Environment:
- Cloud provider: local filesystem and R2
- OS: Linux
- Other:
Bug
What happened:
Apply z-order to a Delta Table on a column that contains strings with identical prefixes of at least 14 characters. The records in the new Parquet files retain their original order.
I initially witnessed this when z-ordering a large partition on ISO 8601 timestamps using delta-rs in Rust. I've since reproduced this with Python bindings and a small data frame using strings containing zero-padded integers (see repro below).
What you expected to happen:
The resulting Parquet files are ordered by the column specified for z-ordering.
How to reproduce it:
# test_zorder.py
import shutil
import pandas
from deltalake import write_deltalake, DeltaTable
def test_zorder() -> None:
table = "a"
field = "b"
items = [f"{item:015}" for item in [2, 3, 1]]
shutil.rmtree(table, ignore_errors=True)
write_deltalake(table, pandas.DataFrame({field: items}))
DeltaTable(table).optimize.z_order([field])
sorted_items = DeltaTable(table).to_pyarrow_table().to_pydict()[field]
assert sorted(items) == sorted_items
Run this with uv:
# caveat: this removes a directory named `a` from the current directory
uvx --with deltalake --with pandas pytest -vv test_zorder.py
Output:
========================= test session starts ==========================
platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 -- /home/claudio/.cache/uv/archive-v0/A-uQ68p-4BWRUFltJ5Mv2/bin/python
cachedir: .pytest_cache
rootdir: ...
collected 1 item
test_zorder.py::test_zorder FAILED [100%]
=============================== FAILURES ===============================
_____________________________ test_zorder ______________________________
...
> assert sorted(items) == sorted_items
E AssertionError: assert ['000000000000001', '000000000000002', '000000000000003'] == ['000000000000002', '000000000000003', '000000000000001']
E
E At index 0 diff: '000000000000001' != '000000000000002'
E
E Full diff:
E [
E + '000000000000001',
E '000000000000002',
E '000000000000003',
E - '000000000000001',
E ]
test_zorder.py:20: AssertionError
======================= short test summary info ========================
FAILED test_zorder.py::test_zorder - AssertionError: assert ['000000000000001', '000000000000002', '000000000000003'] == ['000000000000002', '000000000000003', '000000000000001']
At index 0 diff: '000000000000001' != '000000000000002'
Full diff:
[
+ '000000000000001',
'000000000000002',
'000000000000003',
- '000000000000001',
]
========================== 1 failed in 0.36s ===========================
More details:
N/A