Replies: 7 comments 8 replies
-
Thanks, I've left some initial comments in the POC PR Can you be more specific about the inconsistency concerns? In case of cache write success while actual data file write failure, the existing code leaks the cache file, but I don't think it can lead to any data inconsistency |
Beta Was this translation helpful? Give feedback.
-
On reference count for files being used, one approach I'm thinking:
Pseudocode looks like:
|
Beta Was this translation helpful? Give feedback.
-
There're some other details worth noticing:
I think for the initial version, it's ok to
Implementation-wise, I think it could be split into two PRs:
On testing, for storage / filesystem related code, I think it's better to have C++ related testing via gtest other than merely SQL test. |
Beta Was this translation helpful? Give feedback.
-
Thanks for giving me the opportunity! Timeline-wise, I will try to get a PR out this week; definitely let me know if my progress slow and there's release pressure. |
Beta Was this translation helpful? Give feedback.
-
I have an elementary question regarding a small part of the design. In pg_mooncake and in duckdb why do we create files on disk and then use the filesystem API to read and write to it? I know it might be easier and convenient but if we instead memory map the files (mmap) wouldn't that be a more efficient write avoiding switching from the user space or kernel space ? This maybe is a textbook idea and not much efficient in practice but I think postgres does that as well? |
Beta Was this translation helpful? Give feedback.
-
Design proposal for the read cache cleanup. High-level idea before going to the details:
For performance consideration, use postgres unlogged table to record read cache file, it serves for two purposes:
Here's the table schema:
At the start of a query when a remote file is access, we follow these steps to cache remote files locally:
At the end of query, decrement the reference count for all involved local read cache:
Cache cleanup logic is triggered periodically, and when there's not enough space on disk space:
Since we use unlogged postgres table for performance, so on postgres startup, there's no such unlogged table, we need to
A quick and simple performance consideration and analysis:
Potential improvements:
Testing:
|
Beta Was this translation helpful? Give feedback.
-
twos concerns regarding the RFC:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
In the current implementation pg_mooncake has write cache:
pg_mooncake/src/columnstore/columnstore_table.cpp
Lines 20 to 40 in 8d3575c
pg_mooncake/src/columnstore/execution/columnstore_scan.cpp
Lines 137 to 165 in 8d3575c
But it comes with certain limitations:
Here I propose to switch write cache to read one:
Read cache difficulty:
Unresolved corrupted cache file handling:
I made a POC PR for read cache, feel free to leave comments and happy to discuss!
PR: dentiny#1
Beta Was this translation helpful? Give feedback.
All reactions