Implement TSDB->Parquet RowReader#2
Conversation
Signed-off-by: alanprot <alanprot@gmail.com>
|
Great stuff! I have one proposal - right now we cannot define how long a parquet file should be - its always as long as the range that all blocks we use to create it cover. We could add parameters to the tsdb row reader that define this range "minT, maxT uint64" - that way we can break down 14d blocks ( that thanos or prometheus could compact to ) into many parquet files of length 1d if we want - or anything inbetween! |
Make sense!! I will change the PR to add this filter. |
Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: alanprot <alanprot@gmail.com>
a8498ff to
1983967
Compare
This PR introduces a
tsdbRowReaderinspired by the Cloudflare PoC, but with a key design change.Instead of using a fixed number of encoded data columns, this implementation proposes a more flexible approach: configure only the duration of each data column. This allows the format to be more flexible and adapt blocks of varying time ranges.
Ex:
Block duration: 24h
Configured column duration: 8h
→ Result: 3 data columns
Block duration: 48h
→ Result: 6 data columns
Timestamp Layout
Each data column starts at a calculated offset from the block's minimum timestamp (
min_ts). Ex:min_ts = x,duration = 8hThe
minTs,maxTsanddurationcan be stored on the parquet metadata so we can use thins info to know what data cols to open when running a query.Another change is that we are re-encoding the chunks to make sure they fit perfectly on the data cols boundaries.
PS:I wanna add more tests in this PR but im creating just to start the discussion.