Giving users and option to choose the type of space filling curve when using Liquid Clustering #4347

koedlt · 2025-03-31T14:28:29Z

koedlt
Mar 31, 2025

As explained in this issue, it is worth exploring the use of other types of space filling curves when using Liquid Clustering.

Hilbert curves are a nice compromise between the ordering of all clustering columns. In cases where some columns have more importance (in terms of how often they are used to filter on) than others, this might lower reading performance w.r.t. other writing techniques like Hive-style partitioning.

An obvious choice of a space filling curve is one where the curve gives utmost importance to 1 dimension, before incrementing the next one. It looks like this in 2 dimensions:

As explained in the issue, the functionality could be largely equal to what exists today. The only thing that should change is how the DataFrame is repartitioned. That means that in MultiDimClustering.cluster, a new case should be added where we refer to a new object (next to ZOrderClustering and HilbertClustering ). This is where we would implement that new curve.

The default case would be to just use the hilbert curve, but when the user wants they could use another type of curve. That does mean we would change the SQL API. Some ideas could be to write something like:

ALTER TABLE <table_name>
CLUSTER BY (<clustering_columns>) WITH <curve-type>

or

ALTER TABLE <table_name>
CLUSTER BY (<clustering_columns>) USING <curve-type>

Interested to see what your ideas are on the topic!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Giving users and option to choose the type of space filling curve when using Liquid Clustering #4347

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Giving users and option to choose the type of space filling curve when using Liquid Clustering #4347

Uh oh!

Uh oh!

koedlt Mar 31, 2025

Replies: 0 comments

koedlt
Mar 31, 2025