Sorting when merging adjacent objects? #314
javafanboy
started this conversation in
General
Replies: 1 comment
-
+1, this would be really helpful for my use case. Merging non-adjacent files is currently supported (i.e., there are gaps in files' respective row ids) by adding a '_ducklake_internal_row_id' column to the written files. Therefore, it would seem that permitting sorting during compaction would be relatively straightforward? I've implemented this myself in a custom compaction utility in python and it achieves the desired effect. However, I'd much prefer to not rely on my hand-rolled implementation. Are there plans to support this in the near future? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Assuming I have sorted my data, possibly using several columns, in some suitable order to work well with the min/max information for each column on row group level how can I control that this ordering is maintained when adjacent objects are merged?
I am refering to situations where just aggregating the smaller files would not give the desired result - assume a table sored on a composite of the two first columns (firstly column 1 then 2):
AAAA 123
AAAA 124
BBBB 100
BBBB 101
.................
AAAA 125
BBBB 200
Just aggregating would result in:
AAAA 123
AAAA 124
BBBB 100
BBBB 101
AAAA 125
BBBB 200
while desired result would be maintaining the desired sorting criteria of first column 1 and then column 2:
AAAA 123
AAAA 124
AAAA 125
BBBB 100
BBBB 101
BBBB 200
Beta Was this translation helpful? Give feedback.
All reactions