You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a (arguably very edge) use case where we need to attach ~200 (and climbing) catalogs at the same time, and run a query (a UNION ALL style) across all catalogs. We're hitting connection limits, however, given each attached DuckLake maintains its own connection pool.
What we'd like to be able to do is share connections across multiple attachments. Note: all metadata catalogs reside on a single Postgres database, just in different schemas, so they can (in theory) share connections.
Happy to produce a PR for this, but would love some guidance first on desired implementation; a handful of ideas we've considered on our end, but are still coming up to speed on the codebase and don't fully understand the tradeoffs yet:
Implement a flag passed to the Postgres extension, and manage pooling there
Implement within DuckLake itself
a. Support providing of an already attached Postgres connection
b. Pass additional flags via metadata config to signal shared pool
Additional questions I had looking through the codebase is whether an implementation like this is even safe? Or, for example, are certain assumptions made (such as in setting of the search path) that could cause a number of issues? We've also looked at running a pooling proxy, but this I believe could be subject to similar issues if they exist.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
We have a (arguably very edge) use case where we need to attach ~200 (and climbing) catalogs at the same time, and run a query (a
UNION ALL
style) across all catalogs. We're hitting connection limits, however, given each attached DuckLake maintains its own connection pool.What we'd like to be able to do is share connections across multiple attachments. Note: all metadata catalogs reside on a single Postgres database, just in different schemas, so they can (in theory) share connections.
Happy to produce a PR for this, but would love some guidance first on desired implementation; a handful of ideas we've considered on our end, but are still coming up to speed on the codebase and don't fully understand the tradeoffs yet:
a. Support providing of an already attached Postgres connection
b. Pass additional flags via metadata config to signal shared pool
Additional questions I had looking through the codebase is whether an implementation like this is even safe? Or, for example, are certain assumptions made (such as in setting of the search path) that could cause a number of issues? We've also looked at running a pooling proxy, but this I believe could be subject to similar issues if they exist.
Beta Was this translation helpful? Give feedback.
All reactions