-
Notifications
You must be signed in to change notification settings - Fork 105
Support parallel DuckDB threads for Postgres table scan #762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,8 @@ | |
|
||
#include "pgduckdb/pg/declarations.hpp" | ||
|
||
#include <vector> | ||
|
||
#include "pgduckdb/utility/cpp_only_file.hpp" // Must be last include. | ||
|
||
namespace pgduckdb { | ||
|
@@ -11,7 +13,13 @@ class PostgresTableReader { | |
PostgresTableReader(const char *table_scan_query, bool count_tuples_only); | ||
~PostgresTableReader(); | ||
TupleTableSlot *GetNextTuple(); | ||
bool GetNextMinimalTuple(std::vector<uint8_t> &minimal_tuple_buffer); | ||
void PostgresTableReaderCleanup(); | ||
TupleTableSlot *InitTupleSlot(); | ||
bool | ||
IsParallelScan() const { | ||
return nworkers_launched > 0; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I cannot find the place where you change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is assigned during the initialization of Postgres parallel workers, which is part of the original code logic. I simply added |
||
} | ||
|
||
private: | ||
MinimalTuple GetNextWorkerTuple(); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,6 +69,16 @@ SlotGetAllAttrs(TupleTableSlot *slot) { | |
PostgresFunctionGuard(slot_getallattrs, slot); | ||
} | ||
|
||
void | ||
SlotGetAllAttrsUnsafe(TupleTableSlot *slot) { | ||
slot_getallattrs(slot); | ||
} | ||
Comment on lines
+72
to
+75
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seemed scary to me, but looking closely at the implementation of
But that can condition can never be false, due to the fact that Could you merge this function function with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, sure. I am using the term "unsafe" to refer to the fact that it is not protected by |
||
|
||
TupleTableSlot * | ||
ExecStoreMinimalTupleUnsafe(MinimalTuple minmal_tuple, TupleTableSlot *slot, bool shouldFree) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly to the comment I left above. Let's add a comment why this is safe to use without the lock. Something like:
And just like the function above let's drop the |
||
return ExecStoreMinimalTuple(minmal_tuple, slot, shouldFree); | ||
} | ||
|
||
Relation | ||
OpenRelation(Oid relationId) { | ||
if (PostgresFunctionGuard(check_enable_rls, relationId, InvalidOid, false) == RLS_ENABLED) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1505,6 +1505,7 @@ AppendString(duckdb::Vector &result, Datum value, idx_t offset, bool is_bpchar) | |
|
||
static void | ||
AppendJsonb(duckdb::Vector &result, Datum value, idx_t offset) { | ||
std::lock_guard<std::recursive_mutex> lock(GlobalProcessLock::GetLock()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these the only types that need this additional locking now? Would be good to explicitely state in a comment for each other type why they are thread safe. That way we won't forget to check when introducing support for new types. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, only JSON and LIST. Additionally, as mentioned in #750, both of these types have memory issues. I will add a comment for |
||
auto jsonb = DatumGetJsonbP(value); | ||
auto jsonb_str = JsonbToCString(NULL, &jsonb->root, VARSIZE(jsonb)); | ||
duckdb::string_t str(jsonb_str); | ||
|
@@ -1810,6 +1811,7 @@ ConvertPostgresToDuckValue(Oid attr_type, Datum value, duckdb::Vector &result, i | |
break; | ||
} | ||
case duckdb::LogicalTypeId::LIST: { | ||
std::lock_guard<std::recursive_mutex> lock(GlobalProcessLock::GetLock()); | ||
// Convert Datum to ArrayType | ||
auto array = DatumGetArrayTypeP(value); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
#include "pgduckdb/pgduckdb_types.hpp" | ||
#include "pgduckdb/pgduckdb_utils.hpp" | ||
#include "pgduckdb/pg/relations.hpp" | ||
#include "pgduckdb/pgduckdb_guc.h" | ||
|
||
#include "pgduckdb/pgduckdb_process_lock.hpp" | ||
#include "pgduckdb/logger.hpp" | ||
|
@@ -170,6 +171,12 @@ PostgresScanGlobalState::PostgresScanGlobalState(Snapshot _snapshot, Relation _r | |
ConstructTableScanQuery(input); | ||
table_reader_global_state = | ||
duckdb::make_shared_ptr<PostgresTableReader>(scan_query.str().c_str(), count_tuples_only); | ||
// Default to a single thread if `count_tuples_only` is true or if the reader does not initialize a parallel scan. | ||
max_threads = 1; | ||
if (table_reader_global_state->IsParallelScan() && !count_tuples_only) { | ||
max_threads = duckdb_threads_for_postgres_scan; | ||
} | ||
|
||
pd_log(DEBUG2, "(DuckDB/PostgresSeqScanGlobalState) Running %" PRIu64 " threads -- ", (uint64_t)MaxThreads()); | ||
} | ||
|
||
|
@@ -182,6 +189,7 @@ PostgresScanGlobalState::~PostgresScanGlobalState() { | |
|
||
PostgresScanLocalState::PostgresScanLocalState(PostgresScanGlobalState *_global_state) | ||
: global_state(_global_state), exhausted_scan(false) { | ||
slot = global_state->table_reader_global_state->InitTupleSlot(); | ||
} | ||
|
||
PostgresScanLocalState::~PostgresScanLocalState() { | ||
|
@@ -256,17 +264,51 @@ PostgresScanTableFunction::PostgresScanFunction(duckdb::ClientContext &, duckdb: | |
|
||
local_state.output_vector_size = 0; | ||
|
||
std::lock_guard<std::recursive_mutex> lock(GlobalProcessLock::GetLock()); | ||
for (size_t i = 0; i < STANDARD_VECTOR_SIZE; i++) { | ||
TupleTableSlot *slot = local_state.global_state->table_reader_global_state->GetNextTuple(); | ||
if (pgduckdb::TupleIsNull(slot)) { | ||
local_state.global_state->table_reader_global_state->PostgresTableReaderCleanup(); | ||
local_state.exhausted_scan = true; | ||
break; | ||
// Normal scan without parallelism | ||
bool is_parallel_scan = local_state.global_state->MaxThreads() > 1; | ||
if (!is_parallel_scan) { | ||
std::lock_guard<std::recursive_mutex> lock(GlobalProcessLock::GetLock()); | ||
Comment on lines
+268
to
+270
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this difference in behaviour between 1 and more than one threads doesn't completely make sense. Even if max_threads_per_postgres_scan is 1, it's still possible to have two different postgres scans running in parallel. Those two concurrent postgres scans would still benefit from not holding a lock during InsertTupleIntoChunk. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After reading more I now realize this is probably important for the case where we don't use backgroundworkers for the scan. So maybe we should keep this functionality. But I think it's worth refactoring and/or commenting this a bit more, because now the logic is quite hard to follow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, okay. I was trying to keep the original code unchanged for the single-thread case. |
||
for (size_t i = 0; i < STANDARD_VECTOR_SIZE; i++) { | ||
TupleTableSlot *slot = local_state.global_state->table_reader_global_state->GetNextTuple(); | ||
if (pgduckdb::TupleIsNull(slot)) { | ||
local_state.global_state->table_reader_global_state->PostgresTableReaderCleanup(); | ||
local_state.exhausted_scan = true; | ||
break; | ||
} | ||
|
||
SlotGetAllAttrs(slot); | ||
InsertTupleIntoChunk(output, local_state, slot); | ||
} | ||
SetOutputCardinality(output, local_state); | ||
return; | ||
} | ||
|
||
D_ASSERT(STANDARD_VECTOR_SIZE % LOCAL_STATE_SLOT_BATCH_SIZE == 0); | ||
for (size_t i = 0; i < STANDARD_VECTOR_SIZE / LOCAL_STATE_SLOT_BATCH_SIZE; i++) { | ||
int valid_slots = 0; | ||
{ | ||
std::lock_guard<std::recursive_mutex> lock(GlobalProcessLock::GetLock()); | ||
for (size_t j = 0; j < LOCAL_STATE_SLOT_BATCH_SIZE; j++) { | ||
bool ret = local_state.global_state->table_reader_global_state->GetNextMinimalTuple( | ||
local_state.minimal_tuple_buffer[j]); | ||
if (!ret) { | ||
local_state.global_state->table_reader_global_state->PostgresTableReaderCleanup(); | ||
local_state.exhausted_scan = true; | ||
break; | ||
} | ||
valid_slots++; | ||
} | ||
} | ||
|
||
SlotGetAllAttrs(slot); | ||
InsertTupleIntoChunk(output, local_state, slot); | ||
for (size_t j = 0; j < valid_slots; j++) { | ||
MinimalTuple minmal_tuple = reinterpret_cast<MinimalTuple>(local_state.minimal_tuple_buffer[j].data()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Y-- I need your C++ knowledge. Is this a good way to keep a buffer of MinimalTuples? One thought I had is that now we do two copies of the minimal tuple:
I think if we instead have an array of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't do any copy (and thus doesn't extend/modify its lifetime). I haven't read the code yet, but my first question would be: why are they There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As @Y-- pointed out, only one copy occurs (from Postgres parallel workers' shared memory to the buffer). One benefit of using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sounds like that could cause alignment problems. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, yes. Let me confirm the alignment issue. |
||
local_state.slot = ExecStoreMinimalTupleUnsafe(minmal_tuple, local_state.slot, false); | ||
SlotGetAllAttrsUnsafe(local_state.slot); | ||
InsertTupleIntoChunk(output, local_state, local_state.slot); | ||
} | ||
|
||
if (valid_slots == 0) | ||
break; | ||
} | ||
|
||
SetOutputCardinality(output, local_state); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 32? Maybe we should we make this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was concerned about burdening users with another GUC hyperparameter.
I tested batch sizes of 8, 16, 32, and 64, and found that 32 performs the best. BTW, the batch size helps to amortize the lock overhead across threads.