11# Getting Started
22
3- Dux is a DuckDB-native dataframe library for Elixir. This guide walks you through your first pipeline and the key concepts.
4-
5- ## Setup
6-
73``` elixir
84Mix .install ([{:dux , " ~> 0.1.0" }])
9- ```
10-
11- ``` elixir
125require Dux
136```
147
158## Your first pipeline
169
17- ``` elixir
18- require Dux
10+ Dux is a DuckDB-native dataframe library for Elixir. Pipelines are lazy — operations accumulate until you call ` collect/1 ` .
1911
12+ ``` elixir
2013Dux .from_list ([
2114 %{name: " Alice" , department: " Engineering" , salary: 120_000 },
2215 %{name: " Bob" , department: " Engineering" , salary: 110_000 },
@@ -28,152 +21,143 @@ Dux.from_list([
2821|> Dux .collect ()
2922```
3023
31- ## Key concepts
32-
33- ### Everything is lazy
24+ ## Everything is lazy
3425
35- Operations accumulate in the ` %Dux{} ` struct. Nothing hits DuckDB until you call ` compute/1 ` , ` collect/1 ` , or ` to_columns/1 ` :
26+ Operations accumulate in the ` %Dux{} ` struct. Nothing hits DuckDB until you materialize :
3627
3728``` elixir
3829df =
3930 Dux .from_query (" SELECT * FROM range(1, 101) t(x)" )
4031 |> Dux .filter (x > 50 )
4132 |> Dux .mutate (doubled: x * 2 )
4233
43- # Inspect the lazy ops — no SQL has run yet
4434df.ops
4535```
4636
4737``` elixir
48- # NOW the SQL runs
4938Dux .collect (df)
5039```
5140
52- This lets DuckDB see the full pipeline and optimize across operations.
41+ ## Expressions compile to SQL
5342
54- ### Expressions compile to SQL
55-
56- ` require Dux ` enables the macro versions of ` filter ` , ` mutate ` , and ` summarise ` . Bare identifiers become column names. Use ` ^ ` to interpolate Elixir values:
43+ Bare identifiers become column names. Use ` ^ ` to interpolate Elixir values as parameter bindings:
5744
5845``` elixir
59- require Dux
60-
6146min_price = 50
6247
63- df
64- |> Dux .filter (price > ^min_price and category == " Electronics" )
65- |> Dux .mutate (with_tax: price * 1.08 , upper_name: upper (name))
66- |> Dux .summarise (total: sum (with_tax), n: count (name))
48+ Dux .from_list ([
49+ %{name: " Widget" , price: 25 , category: " Tools" },
50+ %{name: " Gadget" , price: 75 , category: " Electronics" },
51+ %{name: " Doohickey" , price: 100 , category: " Electronics" }
52+ ])
53+ |> Dux .filter (price > ^min_price )
54+ |> Dux .mutate (with_tax: price * 1.08 )
55+ |> Dux .collect ()
6756```
6857
69- > #### SQL injection is impossible {: .tip}
70- >
71- > ` ^ ` interpolations become parameter bindings (` $1 ` , ` $2 ` , ...) in the generated SQL.
72- > User values never appear in the SQL string.
73-
74- ### The ` _with ` variants
75-
76- For programmatic use, the ` _with ` variants accept raw SQL strings:
58+ The ` _with ` variants accept raw SQL strings for programmatic use:
7759
7860``` elixir
79- Dux .filter_with (df, " price > 50 AND category = 'Electronics'" )
80- Dux .mutate_with (df, total: " price * quantity" )
81- Dux .summarise_with (df, avg_price: " AVG(price)" )
61+ Dux .from_query (" SELECT * FROM range(1, 11) t(x)" )
62+ |> Dux .filter_with (" x > 5" )
63+ |> Dux .mutate_with (squared: " x * x" )
64+ |> Dux .collect ()
8265```
8366
8467## Reading and writing data
8568
86- ### CSV
87-
8869``` elixir
89- df = Dux .from_csv (" sales.csv" )
90- df = Dux .from_csv (" sales.tsv" , delimiter: " \t " )
91-
92- Dux .to_csv (df, " output.csv" )
93- ```
94-
95- ### Parquet
70+ path = Path .join (System .tmp_dir! (), " dux_guide.csv" )
9671
97- ``` elixir
98- df = Dux .from_parquet (" data.parquet" )
99- df = Dux .from_parquet (" data/**/*.parquet" ) # glob patterns
72+ Dux .from_list ([
73+ %{name: " Alice" , score: 85 },
74+ %{name: " Bob" , score: 92 },
75+ %{name: " Carol" , score: 78 }
76+ ])
77+ |> Dux .to_csv (path)
10078
101- Dux .to_parquet (df, " output.parquet" , compression: :zstd )
79+ Dux .from_csv (path)
80+ |> Dux .filter (score > 80 )
81+ |> Dux .collect ()
10282```
10383
104- ### NDJSON
105-
10684``` elixir
107- df = Dux .from_ndjson (" events.ndjson" )
108- Dux .to_ndjson (df, " output.ndjson" )
109- ```
85+ parquet_path = Path .join (System .tmp_dir! (), " dux_guide.parquet" )
11086
111- ### Remote sources
87+ Dux .from_query (" SELECT x AS id, x * 10 AS value FROM range(1000) t(x)" )
88+ |> Dux .to_parquet (parquet_path, compression: :zstd )
11289
113- DuckDB extensions handle S3, HTTP, databases — no separate libraries:
114-
115- ``` elixir
116- # S3 via httpfs extension
117- Dux .Connection .load_extension (:httpfs )
118- df = Dux .from_parquet (" s3://my-bucket/data/*.parquet" )
119-
120- # PostgreSQL via postgres_scanner
121- Dux .Connection .load_extension (:postgres_scanner )
122- df = Dux .from_query (" SELECT * FROM postgres_scan('dbname=mydb', 'users')" )
90+ Dux .from_parquet (parquet_path)
91+ |> Dux .filter (value > 5000 )
92+ |> Dux .summarise (total: sum (value), n: count (id))
93+ |> Dux .collect ()
12394```
12495
12596## Aggregation
12697
127- Group and aggregate with ` group_by ` + ` summarise ` :
128-
12998``` elixir
130- require Dux
131-
132- Dux .from_csv (" orders.csv" )
133- |> Dux .group_by (:product )
99+ Dux .from_list ([
100+ %{region: " US" , product: " Widget" , amount: 100 },
101+ %{region: " US" , product: " Gadget" , amount: 200 },
102+ %{region: " EU" , product: " Widget" , amount: 150 },
103+ %{region: " EU" , product: " Gadget" , amount: 300 },
104+ %{region: " US" , product: " Widget" , amount: 175 }
105+ ])
106+ |> Dux .group_by ([:region , :product ])
134107|> Dux .summarise (
135- total_revenue : sum (price * quantity ),
136- order_count : count (id ),
137- avg_price : avg (price )
108+ total : sum (amount ),
109+ orders : count (amount ),
110+ avg_order : avg (amount )
138111)
139- |> Dux .sort_by (desc: :total_revenue )
112+ |> Dux .sort_by (desc: :total )
140113|> Dux .collect ()
141114```
142115
143116## Joins
144117
145118``` elixir
146- orders = Dux .from_csv (" orders.csv" )
147- customers = Dux .from_csv (" customers.csv" )
119+ orders =
120+ Dux .from_list ([
121+ %{order_id: 1 , customer_id: 10 , product_id: 100 , qty: 5 },
122+ %{order_id: 2 , customer_id: 10 , product_id: 101 , qty: 3 },
123+ %{order_id: 3 , customer_id: 11 , product_id: 100 , qty: 2 }
124+ ])
125+
126+ customers =
127+ Dux .from_list ([
128+ %{customer_id: 10 , name: " Alice" },
129+ %{customer_id: 11 , name: " Bob" }
130+ ])
131+
132+ products =
133+ Dux .from_list ([
134+ %{product_id: 100 , product_name: " Widget" , unit_price: 25 },
135+ %{product_id: 101 , product_name: " Gadget" , unit_price: 50 }
136+ ])
148137
149138orders
150139|> Dux .join (customers, on: :customer_id )
151- |> Dux .select ([:order_id , :customer_name , :total ])
140+ |> Dux .join (products, on: :product_id )
141+ |> Dux .mutate (total: qty * unit_price)
142+ |> Dux .group_by (:name )
143+ |> Dux .summarise (spend: sum (total), orders: count (order_id))
144+ |> Dux .sort_by (:name )
152145|> Dux .collect ()
153146```
154147
155- Join types: ` :inner ` (default), ` :left ` , ` :right ` , ` :cross ` , ` :anti ` , ` :semi ` .
156-
157- For columns with different names:
158-
159- ``` elixir
160- Dux .join (orders, products, on: [{:product_id , :id }])
161- ```
162-
163- ## Debugging with sql_preview
164-
165- See the generated SQL without executing:
148+ ## See the generated SQL
166149
167150``` elixir
168- Dux .from_csv (" data.csv" )
169- |> Dux .filter (x > 10 )
170- |> Dux .mutate (y: x * 2 )
151+ Dux .from_query (" SELECT * FROM range(100) t(x)" )
152+ |> Dux .filter (x > 50 )
153+ |> Dux .mutate (doubled: x * 2 )
154+ |> Dux .group_by (:doubled )
155+ |> Dux .summarise (n: count (x))
171156|> Dux .sql_preview ()
172- # "WITH\n __s0 AS (SELECT * FROM ...)\n __s1 AS (...)\nSELECT * FROM __s1"
173157```
174158
175159## Next steps
176160
177- - [ Distributed Queries] ( distributed-queries.md ) — run Dux across a BEAM cluster
178- - [ Graph Analytics] ( graph-analytics.md ) — PageRank, shortest paths, and more
179- - [ API Reference] ( Dux.html ) — full module documentation
161+ * [ Distributed Queries] ( distributed-queries.livemd ) — run Dux across a BEAM cluster
162+ * [ Graph Analytics] ( graph-analytics.livemd ) — PageRank, shortest paths, and more
163+ * [ API Reference] ( https://hexdocs.pm/dux/ Dux.html) — full module documentation
0 commit comments