Note is not clear how and if individual parquet files and row groups should be sorted. We need to clarify it, and/or introduce a new properties keyword about that.
These what I've found about sorting in the note (v1.0):
- Section 4.1 says "For instance, if rowgroups are made to be small and sorted by the identification number of the survey", which tells that row groups may be sorted by the identifier
- Section 3.3.1 gives this description for
hats_cols_sort keyword: "At catalog creation time, the columns used to sort the data, in addition to _healpix_29 column.", which kinda tells that it is already sorted by _healpix_29.
Maybe we should make hats_cols_sort a list of all sorting columns including _healpix_29?
Note is not clear how and if individual parquet files and row groups should be sorted. We need to clarify it, and/or introduce a new properties keyword about that.
These what I've found about sorting in the note (v1.0):
hats_cols_sortkeyword: "At catalog creation time, the columns used to sort the data, in addition to _healpix_29 column.", which kinda tells that it is already sorted by _healpix_29.Maybe we should make
hats_cols_sorta list of all sorting columns including_healpix_29?