Skip to content

Commit c1adfcc

Browse files
authored
DuckDB (#955)
* DuckDB * talk about not needing scripting language * Doc for DuckDB
1 parent f2b9e27 commit c1adfcc

File tree

11 files changed

+128
-7
lines changed

11 files changed

+128
-7
lines changed
127 KB
Loading

changelog/2025-05-22-duckdb/index.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
slug: duckdb
3+
version: v1.493.0
4+
title: DuckDB
5+
tags: ['scripts', 'storage']
6+
description: You can run DuckDB scripts in-memory, with access to S3 objects and other database resources. You no longer need a scripting language for your ETL pipelines with DuckDB/Polars, you can do it entirely in SQL
7+
features:
8+
- S3 object integration
9+
- Attach to BigQuery, Postgres and MySQL database resources with all CRUD operations
10+
image: ./duckdb.png
11+
docs: /docs/getting_started/scripts_quickstart/sql#duckdb-1
12+
---

docs/assets/integrations/duckdb.png

127 KB
Loading

docs/core_concepts/11_persistent_storage/large_data_files.mdx

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,27 @@ Windmill S3 bucket browser will not work for buckets containing more than 20 fil
3131
ETLs can be easily implemented in Windmill using its integration with Polars and DuckDB for facilitate working with tabular data. In this case, you don't need to manually interact with the S3 bucket, Polars/DuckDB does it natively and in a efficient way. Reading and Writing datasets to S3 can be done seamlessly.
3232

3333
<Tabs className="unique-tabs">
34+
<TabItem value="duckdb-script" label="DuckDB" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
35+
36+
```sql
37+
-- $file1 (s3object)
38+
39+
-- Run queries directly on an S3 parquet file passed as an argument
40+
SELECT * FROM read_parquet($file1)
41+
42+
-- Or using an explicit path in a workspace storage
43+
SELECT * FROM read_json('s3:///demo/data.json')
44+
45+
-- You can also specify a secondary workspace storage
46+
SELECT * FROM read_csv('s3://secondary_storage/demo/data.csv')
47+
48+
-- Write the result of a query to a different parquet file on S3
49+
COPY (
50+
SELECT COUNT(*) FROM read_parquet($file1)
51+
) TO 's3:///demo/output.pq' (FORMAT 'parquet');
52+
```
53+
54+
</TabItem>
3455
<TabItem value="polars" label="Polars" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
3556

3657
```python
@@ -77,7 +98,7 @@ def main(input_file: S3Object):
7798
```
7899

79100
</TabItem>
80-
<TabItem value="duckdb" label="DuckDB" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
101+
<TabItem value="duckdb" label="DuckDB (Python)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
81102

82103
```python
83104
#requirements:

docs/core_concepts/27_data_pipelines/index.mdx

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ def main(input_file: S3Object):
168168
```
169169

170170
</TabItem>
171-
<TabItem value="duckdb (AWS S3)" label="DuckDB (AWS S3)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
171+
<TabItem value="duckdb (Python / AWS S3)" label="DuckDB (Python / AWS S3)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
172172

173173
```python
174174
import wmill
@@ -221,7 +221,7 @@ def main(input_file: S3Object):
221221
```
222222

223223
</TabItem>
224-
<TabItem value="duckdb (Azure Blob Storage)" label="DuckDB (Azure Blob Storage)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
224+
<TabItem value="duckdb (Python / Azure Blob Storage)" label="DuckDB (Python / Azure Blob Storage)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
225225

226226
```python
227227
import wmill
@@ -241,7 +241,7 @@ def main(input_file: S3Object):
241241
# create a DuckDB database in memory
242242
# see https://duckdb.org/docs/api/python/dbapi
243243
conn = duckdb.connect()
244-
244+
245245
# connect duck db to the S3 bucket - this will default to the workspace S3 resource
246246
conn.execute(connection_str)
247247

@@ -259,13 +259,34 @@ def main(input_file: S3Object):
259259

260260
# NOTE: DuckDB doesn't support writing to Azure Blob Storage as of Jan 30 2025
261261
# Write the result of a query to a different parquet file on Azure Blob Storage
262-
# using Polars
262+
# using Polars
263263
storage_options = wmill.polars_connection_settings().storage_options
264264
query_result.pl().write_parquet(output_uri, storage_options=storage_options)
265265
conn.close()
266266
return S3Object(s3=output_file)
267267
```
268268

269+
</TabItem>
270+
<TabItem value="duckdb" label="DuckDb (AWS S3)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
271+
```sql
272+
-- $file1 (s3object)
273+
274+
-- Run queries directly on an S3 parquet file passed as an argument
275+
SELECT * FROM read_parquet($file1);
276+
277+
-- Or using an explicit path in a workspace storage
278+
SELECT * FROM read_json('s3:///demo/data.json');
279+
280+
-- You can also specify a secondary workspace storage
281+
SELECT * FROM read_csv('s3://secondary_storage/demo/data.csv');
282+
283+
-- Write the result of a query to a different parquet file on S3
284+
COPY (
285+
SELECT COUNT(*) FROM read_parquet($file1)
286+
) TO 's3:///demo/output.pq' (FORMAT 'parquet');
287+
288+
```
289+
269290
</TabItem>
270291
</Tabs>
271292

@@ -283,7 +304,16 @@ With S3 as the external store, a transformation script in a flow will typically
283304
2. Running some computation on the data.
284305
3. Storing the result back to S3 for the next scripts to be run.
285306

286-
Windmill SDKs now expose helpers to simplify code and help you connect Polars or DuckDB to the Windmill workspace S3 bucket. In your usual IDE, you would need to write for _each script_:
307+
When running a DuckDB script, Windmill automatically handles connection to your workspace storage :
308+
309+
```sql
310+
-- This queries the windmill api under the hood to figure out the
311+
-- correct connection string
312+
SELECT * FROM read_parquet('s3:///path/to/file.parquet');
313+
SELECT * FROM read_csv('s3://secondary_storage/path/to/file.csv');
314+
```
315+
316+
If you want to use a scripting language, Windmill SDKs now expose helpers to simplify code and help you connect Polars or DuckDB to the Windmill workspace S3 bucket. In your usual IDE, you would need to write for _each script_:
287317

288318
```python
289319
conn = duckdb.connect()

docs/getting_started/0_scripts_quickstart/5_sql_quickstart/index.mdx

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ import DocCard from '@site/src/components/DocCard';
77
import Tabs from '@theme/Tabs';
88
import TabItem from '@theme/TabItem';
99

10-
# PostgreSQL, MySQL, MS SQL, BigQuery, Snowflake, Redshift, Oracle
10+
# PostgreSQL, MySQL, MS SQL, BigQuery, Snowflake, Redshift, Oracle, DuckDB
1111

1212
In this quick start guide, we will write our first script in SQL. We will see how to connect a Windmill instance to an external SQL service and then send queries to the database using Windmill Scripts.
1313

@@ -344,6 +344,10 @@ Here's a step-by-step guide on where to find each detail.
344344

345345
You can directly "Test connection" if needed.
346346

347+
### DuckDB
348+
349+
DuckDB scripts run in-memory out-of-the-box.
350+
347351
## Create script
348352

349353
Next, let's create a script that will use the newly created Resource. From the Home page,
@@ -517,6 +521,30 @@ UPDATE demo SET col2 = :name3 WHERE col2 = :name2;
517521

518522
"name1", "name2", "name3" being the names of the arguments, and "default arg" the optional default value.
519523

524+
### DuckDB
525+
526+
DuckDB arguments need to be passed in the given format:
527+
```sql
528+
-- $name1 (text) = default arg
529+
-- $name2 (int)
530+
INSERT INTO demo VALUES ($name1, $name2)
531+
```
532+
"name1", "name2" being the names of the arguments, and "default arg" the optional default value.
533+
534+
You can pass a file on S3 as an argument of type s3object. This will automatically setup httpfs with the s3 credentials of the corresponding storage.
535+
You can then query this file using the standard read_csv/read_parquet/read_json functions :
536+
```sql
537+
-- $file (s3object)
538+
SELECT * FROM read_parquet($file)
539+
```
540+
541+
You can also attach to other database resources (BigQuery, PostgreSQL and MySQL). We use the official and community DuckDB extensions under the hood :
542+
```sql
543+
ATTACH '$res:u/demo/amazed_postgresql' AS db (TYPE postgres);
544+
SELECT * FROM db.public.friends;
545+
```
546+
547+
520548
Database resource can be specified from the UI or directly within the script with a line `-- database resource_path`.
521549

522550
<video

docs/integrations/0_integrations_on_windmill.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ On [self-hosted instances](../advanced/1_self_host/index.mdx), integrating OAuth
108108
| [Cloudlare R2](./cloudflare-r2.mdx) | Cloud object storage service for data-intensive applications |
109109
| [Datadog](./datadog.md) | Monitoring and analytics platform for cloud-scale infrastructure and applications |
110110
| [Discord](./discord.md) | Voice, video, and text communication platform for gamers |
111+
| [DuckDB](./duckdb.md) | Open-source, in-process SQL OLAP database management system |
111112
| [FaunaDB](./faunadb.md) | Serverless, document-oriented database for modern applications |
112113
| [Funkwhale](./funkwhale.md) | Open-source music streaming and sharing platform |
113114
| [Git repository](./git_repository.mdx) | Remote git repository for distributed version control systems |

docs/integrations/duckdb.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# DuckDB integration
2+
3+
[DuckDB](https://duckdb.org/) is an open-source, in-process SQL OLAP database management system designed for fast analytical query workloads.
4+
5+
Windmill supports seamless integration with DuckDB, allowing you to manipulate data from S3 (csv, parquet, json), BigQuery, PostgreSQL, and MySQL.
6+
7+
![Integration between DuckDB and Windmill](../assets/integrations/duckdb.png 'Run a DuckDB script with Windmill')
8+
9+
To get started, check out the [SQL Getting Started section](/docs/getting_started/scripts_quickstart/sql#duckdb-1).

sidebars.js

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -429,6 +429,11 @@ const sidebars = {
429429
id: 'integrations/discord',
430430
label: 'Discord'
431431
},
432+
{
433+
type: 'doc',
434+
id: 'integrations/duckdb',
435+
label: 'DuckDB'
436+
},
432437
{
433438
type: 'doc',
434439
id: 'integrations/faunadb',

src/landing/IntergrationList.jsx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ const integrations = [
3232
{ name: 'Cloudflare-r2', src: 'third_party_logos/cloudflare.svg' },
3333
{ name: 'Datadog', src: 'third_party_logos/datadog.svg' },
3434
{ name: 'Discord', src: 'third_party_logos/discord.svg' },
35+
{ name: 'DuckDB', src: 'third_party_logos/duckdb.svg' },
3536
{ name: 'FaunaDB', src: 'third_party_logos/faunadb.svg' },
3637
{ name: 'Funkwhale', src: 'third_party_logos/funkwhale.svg' },
3738
{ name: 'Gcal', src: 'third_party_logos/gcal.svg' },

static/third_party_logos/duckdb.svg

Lines changed: 14 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)