Some questions #350

fr3fou · 2024-10-23T09:10:09Z

fr3fou
Oct 23, 2024

Hey, I've been wondering about some questions after coming across this project recently:

Are there any plans on the roadmap for getting the PG_DuckDB extension approved in AWS RDS? We use a managed RDS instance and don't want to set up our own instance or pay some other vendor for a cloud offering.
Can you make a pg_duckdb table that always implicitly reads a Parquet file from S3 when you query it?
Can you have Hive styled partitioning for the aforementioned table, such that you can read multiple Parquet files that back the underlying table/view based on the WHERE filters?
Can you cache said views in order to avoid wasting S3 bandwidth? If yes, how can you invalidate them? Would this be part of a scheduled ETL/ELT pipeline? E.g. periodically ingesting your files in your data lake and then recreating a table/view which stores the underlying Parquet file(s). My main concern is if you have a customer facing frontend which uses DuckDB on the backend to return the data, would the latency (and cost of downloading the entire Parquet file) be a good idea in the long run.
Does the pg_duckdb extension store anything in a duck.db file? If yes, what's it used for, if no, can we treat it as "just" an execution engine?
How do transactions work across both the PostgreSQL and DuckDB engine? E.g. can I read from a Parquet file (or a view/table which queries the underlying Parquet file(s), build some analytical query and store the results in a PostgreSQL table?

wuputah · 2024-10-24T16:12:34Z

wuputah
Oct 24, 2024
Maintainer

We don't control what AWS RDS offers on their instances. We're open to having discussions with them, so feel free to request it via a support ticket. If you're interested in a hosted solution, check out Hydra, who helped build pg_duckdb.
I would recommend using a Postgres view to wrap a read_parquet call to do this. Aside from views, there's no such equivalent feature in DuckDB we could expose.
Yes, read_parquet is supported in its entirety. Also see the details in our docs.
There is a basic caching feature in duckdb.cache. Look for more in this area in the future (including finishing off the docs, sorry about that!). Hydra can also set up a bucket for you that won't have any bandwidth transfer costs when using their service. Cache invalidation is one of the hard problems. We'll have to get back to you on solving that one.
No, such a file would not be a "safe" place to store data for a Postgres system so it is not used.
See the COPY feature for pushing out, and you can INSERT INTO ... SELECT for pulling in. If you run into issues specifically with transactions, please open a bug report.

Hope this helps!

0 replies

dhirschfeld · 2025-11-12T20:14:58Z

dhirschfeld
Nov 12, 2025

Now that version 1.0.0 is out I'm hoping AWS will make it available on RDS.

If anyone has any insight on that it would be great if they could share here! 🙏

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some questions #350

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Some questions #350

Uh oh!

Uh oh!

fr3fou Oct 23, 2024

Replies: 2 comments

Uh oh!

Uh oh!

wuputah Oct 24, 2024 Maintainer

Uh oh!

dhirschfeld Nov 12, 2025

fr3fou
Oct 23, 2024

wuputah
Oct 24, 2024
Maintainer

dhirschfeld
Nov 12, 2025