-
Couldn't load subscription status.
- Fork 537
Closed
Labels
bugSomething isn't workingSomething isn't workingon-holdIssues and Pull Requests that are on hold for some reasonIssues and Pull Requests that are on hold for some reason
Description
Environment
python3
Delta-rs version:
0.22.2
Binding:
Environment:
- Cloud provider:
- OS: centos8
- Other:
Bug
What happened:
i insert 4000 rows data into a delta table
create checkpoint
optimize more than once
data now is 12000 rows
In [109]: dt.create_checkpoint()
In [110]: dt.optimize.compact(target_size=1024*256)
Out[110]:
{'numFilesAdded': 2,
'numFilesRemoved': 2,
'filesAdded': '{"avg":112461.0,"max":168540,"min":56382,"totalFiles":2,"totalSize":224922}',
'filesRemoved': '{"avg":127678.5,"max":168540,"min":86817,"totalFiles":2,"totalSize":255357}',
'partitionsOptimized': 1,
'numBatches': 4,
'totalConsideredFiles': 2,
'totalFilesSkipped': 0,
'preserveInsertionOrder': True}
In [111]: dt.version()
Out[111]: 5
In [112]: dt.to_pandas()
Out[112]:
id value
0 1000 value-1000-4ryilo616rsw4pz8on92tbyi2o04hgkrug0...
1 1001 value-1001-3eh8aav3x21jwkme3h9e56dyc4lrdhlzur1...
2 1002 value-1002-bpxn3ndnll87fq6f17tv1ij0pqhra7wj0jx...
3 1003 value-1003-g0bssmsjxrt21a3p95a7g8q2mic043ym511...
4 1004 value-1004-6yjmva7ezuwtwlw0vymf1ldzq60ih4yzvmc...
... ... ...
3995 995 value-995-x96gnl6173qdzuev650z9o2dfb0pg3wzthyq...
3996 996 value-996-0883a2ltvaic6wfsu1wk7quj6n04kawgnnfx...
3997 997 value-997-kjp4433vk5x37ly1yrf0ozboqzvn4mfh5u94...
3998 998 value-998-6bvhlcqlbkn1jsr8rh3xes3ggm4glwd3pk7i...
3999 999 value-999-r4qcrvung6u0kvq6slgcw9jp4plutst3109h...
[4000 rows x 2 columns]
In [113]:
In [113]: dt.optimize.compact(target_size=1024*256)
Out[113]:
{'numFilesAdded': 2,
'numFilesRemoved': 2,
'filesAdded': '{"avg":112461.0,"max":168540,"min":56382,"totalFiles":2,"totalSize":224922}',
'filesRemoved': '{"avg":112461.0,"max":168540,"min":56382,"totalFiles":2,"totalSize":224922}',
'partitionsOptimized': 1,
'numBatches': 4,
'totalConsideredFiles': 2,
'totalFilesSkipped': 0,
'preserveInsertionOrder': True}
In [114]:
In [114]: dt=DeltaTable(path)
In [115]: dt.optimize.compact(target_size=1024*256)
Out[115]:
{'numFilesAdded': 4,
'numFilesRemoved': 4,
'filesAdded': '{"avg":112461.0,"max":168540,"min":56382,"totalFiles":4,"totalSize":449844}',
'filesRemoved': '{"avg":120069.75,"max":168540,"min":56382,"totalFiles":4,"totalSize":480279}',
'partitionsOptimized': 1,
'numBatches': 8,
'totalConsideredFiles': 4,
'totalFilesSkipped': 0,
'preserveInsertionOrder': True}
In [116]:
In [116]: dt=DeltaTable(path)
In [117]: dt.to_pandas()
Out[117]:
id value
0 1000 value-1000-4ryilo616rsw4pz8on92tbyi2o04hgkrug0...
1 1001 value-1001-3eh8aav3x21jwkme3h9e56dyc4lrdhlzur1...
2 1002 value-1002-bpxn3ndnll87fq6f17tv1ij0pqhra7wj0jx...
3 1003 value-1003-g0bssmsjxrt21a3p95a7g8q2mic043ym511...
4 1004 value-1004-6yjmva7ezuwtwlw0vymf1ldzq60ih4yzvmc...
... ... ...
11995 995 value-995-x96gnl6173qdzuev650z9o2dfb0pg3wzthyq...
11996 996 value-996-0883a2ltvaic6wfsu1wk7quj6n04kawgnnfx...
11997 997 value-997-kjp4433vk5x37ly1yrf0ozboqzvn4mfh5u94...
11998 998 value-998-6bvhlcqlbkn1jsr8rh3xes3ggm4glwd3pk7i...
11999 999 value-999-r4qcrvung6u0kvq6slgcw9jp4plutst3109h...
[12000 rows x 2 columns]
What you expected to happen:
after multi optimize,the data will still be 4000 rows
How to reproduce it:
do not create checkpoint if i want optimze
More details:
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingon-holdIssues and Pull Requests that are on hold for some reasonIssues and Pull Requests that are on hold for some reason