Skip to content

Commit 6fcf030

Browse files
cigraingerclaude
andcommitted
docs: proper Livebook format — setup cell, self-contained examples
- Setup uses single cell with Mix.install + require Dux - Removed ## Setup/## Section headers (Livebook handles this) - All examples are self-contained (use from_list/from_query, no references to nonexistent files like sales.csv) - Removed commented-out output (Livebook saves real output) - Fixed next steps links to point to .livemd - Atom keys throughout Run in Livebook, execute all cells, save — outputs are embedded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4f2f3c1 commit 6fcf030

3 files changed

Lines changed: 80 additions & 106 deletions

File tree

guides/distributed-queries.livemd

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,7 @@
11
# Distributed Queries
22

3-
## Setup
4-
53
```elixir
64
Mix.install([{:dux, "~> 0.1.0"}])
7-
```
8-
9-
```elixir
105
require Dux
116
```
127

guides/getting-started.livemd

Lines changed: 80 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,15 @@
11
# Getting Started
22

3-
Dux is a DuckDB-native dataframe library for Elixir. This guide walks you through your first pipeline and the key concepts.
4-
5-
## Setup
6-
73
```elixir
84
Mix.install([{:dux, "~> 0.1.0"}])
9-
```
10-
11-
```elixir
125
require Dux
136
```
147

158
## Your first pipeline
169

17-
```elixir
18-
require Dux
10+
Dux is a DuckDB-native dataframe library for Elixir. Pipelines are lazy — operations accumulate until you call `collect/1`.
1911

12+
```elixir
2013
Dux.from_list([
2114
%{name: "Alice", department: "Engineering", salary: 120_000},
2215
%{name: "Bob", department: "Engineering", salary: 110_000},
@@ -28,152 +21,143 @@ Dux.from_list([
2821
|> Dux.collect()
2922
```
3023

31-
## Key concepts
32-
33-
### Everything is lazy
24+
## Everything is lazy
3425

35-
Operations accumulate in the `%Dux{}` struct. Nothing hits DuckDB until you call `compute/1`, `collect/1`, or `to_columns/1`:
26+
Operations accumulate in the `%Dux{}` struct. Nothing hits DuckDB until you materialize:
3627

3728
```elixir
3829
df =
3930
Dux.from_query("SELECT * FROM range(1, 101) t(x)")
4031
|> Dux.filter(x > 50)
4132
|> Dux.mutate(doubled: x * 2)
4233

43-
# Inspect the lazy ops — no SQL has run yet
4434
df.ops
4535
```
4636

4737
```elixir
48-
# NOW the SQL runs
4938
Dux.collect(df)
5039
```
5140

52-
This lets DuckDB see the full pipeline and optimize across operations.
41+
## Expressions compile to SQL
5342

54-
### Expressions compile to SQL
55-
56-
`require Dux` enables the macro versions of `filter`, `mutate`, and `summarise`. Bare identifiers become column names. Use `^` to interpolate Elixir values:
43+
Bare identifiers become column names. Use `^` to interpolate Elixir values as parameter bindings:
5744

5845
```elixir
59-
require Dux
60-
6146
min_price = 50
6247

63-
df
64-
|> Dux.filter(price > ^min_price and category == "Electronics")
65-
|> Dux.mutate(with_tax: price * 1.08, upper_name: upper(name))
66-
|> Dux.summarise(total: sum(with_tax), n: count(name))
48+
Dux.from_list([
49+
%{name: "Widget", price: 25, category: "Tools"},
50+
%{name: "Gadget", price: 75, category: "Electronics"},
51+
%{name: "Doohickey", price: 100, category: "Electronics"}
52+
])
53+
|> Dux.filter(price > ^min_price)
54+
|> Dux.mutate(with_tax: price * 1.08)
55+
|> Dux.collect()
6756
```
6857

69-
> #### SQL injection is impossible {: .tip}
70-
>
71-
> `^` interpolations become parameter bindings (`$1`, `$2`, ...) in the generated SQL.
72-
> User values never appear in the SQL string.
73-
74-
### The `_with` variants
75-
76-
For programmatic use, the `_with` variants accept raw SQL strings:
58+
The `_with` variants accept raw SQL strings for programmatic use:
7759

7860
```elixir
79-
Dux.filter_with(df, "price > 50 AND category = 'Electronics'")
80-
Dux.mutate_with(df, total: "price * quantity")
81-
Dux.summarise_with(df, avg_price: "AVG(price)")
61+
Dux.from_query("SELECT * FROM range(1, 11) t(x)")
62+
|> Dux.filter_with("x > 5")
63+
|> Dux.mutate_with(squared: "x * x")
64+
|> Dux.collect()
8265
```
8366

8467
## Reading and writing data
8568

86-
### CSV
87-
8869
```elixir
89-
df = Dux.from_csv("sales.csv")
90-
df = Dux.from_csv("sales.tsv", delimiter: "\t")
91-
92-
Dux.to_csv(df, "output.csv")
93-
```
94-
95-
### Parquet
70+
path = Path.join(System.tmp_dir!(), "dux_guide.csv")
9671

97-
```elixir
98-
df = Dux.from_parquet("data.parquet")
99-
df = Dux.from_parquet("data/**/*.parquet") # glob patterns
72+
Dux.from_list([
73+
%{name: "Alice", score: 85},
74+
%{name: "Bob", score: 92},
75+
%{name: "Carol", score: 78}
76+
])
77+
|> Dux.to_csv(path)
10078

101-
Dux.to_parquet(df, "output.parquet", compression: :zstd)
79+
Dux.from_csv(path)
80+
|> Dux.filter(score > 80)
81+
|> Dux.collect()
10282
```
10383

104-
### NDJSON
105-
10684
```elixir
107-
df = Dux.from_ndjson("events.ndjson")
108-
Dux.to_ndjson(df, "output.ndjson")
109-
```
85+
parquet_path = Path.join(System.tmp_dir!(), "dux_guide.parquet")
11086

111-
### Remote sources
87+
Dux.from_query("SELECT x AS id, x * 10 AS value FROM range(1000) t(x)")
88+
|> Dux.to_parquet(parquet_path, compression: :zstd)
11289

113-
DuckDB extensions handle S3, HTTP, databases — no separate libraries:
114-
115-
```elixir
116-
# S3 via httpfs extension
117-
Dux.Connection.load_extension(:httpfs)
118-
df = Dux.from_parquet("s3://my-bucket/data/*.parquet")
119-
120-
# PostgreSQL via postgres_scanner
121-
Dux.Connection.load_extension(:postgres_scanner)
122-
df = Dux.from_query("SELECT * FROM postgres_scan('dbname=mydb', 'users')")
90+
Dux.from_parquet(parquet_path)
91+
|> Dux.filter(value > 5000)
92+
|> Dux.summarise(total: sum(value), n: count(id))
93+
|> Dux.collect()
12394
```
12495

12596
## Aggregation
12697

127-
Group and aggregate with `group_by` + `summarise`:
128-
12998
```elixir
130-
require Dux
131-
132-
Dux.from_csv("orders.csv")
133-
|> Dux.group_by(:product)
99+
Dux.from_list([
100+
%{region: "US", product: "Widget", amount: 100},
101+
%{region: "US", product: "Gadget", amount: 200},
102+
%{region: "EU", product: "Widget", amount: 150},
103+
%{region: "EU", product: "Gadget", amount: 300},
104+
%{region: "US", product: "Widget", amount: 175}
105+
])
106+
|> Dux.group_by([:region, :product])
134107
|> Dux.summarise(
135-
total_revenue: sum(price * quantity),
136-
order_count: count(id),
137-
avg_price: avg(price)
108+
total: sum(amount),
109+
orders: count(amount),
110+
avg_order: avg(amount)
138111
)
139-
|> Dux.sort_by(desc: :total_revenue)
112+
|> Dux.sort_by(desc: :total)
140113
|> Dux.collect()
141114
```
142115

143116
## Joins
144117

145118
```elixir
146-
orders = Dux.from_csv("orders.csv")
147-
customers = Dux.from_csv("customers.csv")
119+
orders =
120+
Dux.from_list([
121+
%{order_id: 1, customer_id: 10, product_id: 100, qty: 5},
122+
%{order_id: 2, customer_id: 10, product_id: 101, qty: 3},
123+
%{order_id: 3, customer_id: 11, product_id: 100, qty: 2}
124+
])
125+
126+
customers =
127+
Dux.from_list([
128+
%{customer_id: 10, name: "Alice"},
129+
%{customer_id: 11, name: "Bob"}
130+
])
131+
132+
products =
133+
Dux.from_list([
134+
%{product_id: 100, product_name: "Widget", unit_price: 25},
135+
%{product_id: 101, product_name: "Gadget", unit_price: 50}
136+
])
148137

149138
orders
150139
|> Dux.join(customers, on: :customer_id)
151-
|> Dux.select([:order_id, :customer_name, :total])
140+
|> Dux.join(products, on: :product_id)
141+
|> Dux.mutate(total: qty * unit_price)
142+
|> Dux.group_by(:name)
143+
|> Dux.summarise(spend: sum(total), orders: count(order_id))
144+
|> Dux.sort_by(:name)
152145
|> Dux.collect()
153146
```
154147

155-
Join types: `:inner` (default), `:left`, `:right`, `:cross`, `:anti`, `:semi`.
156-
157-
For columns with different names:
158-
159-
```elixir
160-
Dux.join(orders, products, on: [{:product_id, :id}])
161-
```
162-
163-
## Debugging with sql_preview
164-
165-
See the generated SQL without executing:
148+
## See the generated SQL
166149

167150
```elixir
168-
Dux.from_csv("data.csv")
169-
|> Dux.filter(x > 10)
170-
|> Dux.mutate(y: x * 2)
151+
Dux.from_query("SELECT * FROM range(100) t(x)")
152+
|> Dux.filter(x > 50)
153+
|> Dux.mutate(doubled: x * 2)
154+
|> Dux.group_by(:doubled)
155+
|> Dux.summarise(n: count(x))
171156
|> Dux.sql_preview()
172-
# "WITH\n __s0 AS (SELECT * FROM ...)\n __s1 AS (...)\nSELECT * FROM __s1"
173157
```
174158

175159
## Next steps
176160

177-
- [Distributed Queries](distributed-queries.md) — run Dux across a BEAM cluster
178-
- [Graph Analytics](graph-analytics.md) — PageRank, shortest paths, and more
179-
- [API Reference](Dux.html) — full module documentation
161+
* [Distributed Queries](distributed-queries.livemd) — run Dux across a BEAM cluster
162+
* [Graph Analytics](graph-analytics.livemd) — PageRank, shortest paths, and more
163+
* [API Reference](https://hexdocs.pm/dux/Dux.html) — full module documentation

guides/graph-analytics.livemd

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,7 @@
11
# Graph Analytics
22

3-
## Setup
4-
53
```elixir
64
Mix.install([{:dux, "~> 0.1.0"}])
7-
```
8-
9-
```elixir
105
require Dux
116
```
127

0 commit comments

Comments
 (0)