docs: proper Livebook format — setup cell, self-contained examples

cigrainger · claude · cigrainger · commit 6fcf030638ac · 2026-03-21T18:07:02.000+11:00
- Setup uses single cell with Mix.install + require Dux
- Removed ## Setup/## Section headers (Livebook handles this)
- All examples are self-contained (use from_list/from_query, no
  references to nonexistent files like sales.csv)
- Removed commented-out output (Livebook saves real output)
- Fixed next steps links to point to .livemd
- Atom keys throughout

Run in Livebook, execute all cells, save — outputs are embedded.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/guides/distributed-queries.livemd b/guides/distributed-queries.livemd
@@ -1,12 +1,7 @@
 # Distributed Queries
 
-## Setup
-
 ```elixir
 Mix.install([{:dux, "~> 0.1.0"}])
-```
-
-```elixir
 require Dux
 ```
 
diff --git a/guides/getting-started.livemd b/guides/getting-started.livemd
@@ -1,22 +1,15 @@
 # Getting Started
 
-Dux is a DuckDB-native dataframe library for Elixir. This guide walks you through your first pipeline and the key concepts.
-
-## Setup
-
 ```elixir
 Mix.install([{:dux, "~> 0.1.0"}])
-```
-
-```elixir
 require Dux
 ```
 
 ## Your first pipeline
 
-```elixir
-require Dux
+Dux is a DuckDB-native dataframe library for Elixir. Pipelines are lazy — operations accumulate until you call `collect/1`.
 
+```elixir
 Dux.from_list([
   %{name: "Alice", department: "Engineering", salary: 120_000},
   %{name: "Bob", department: "Engineering", salary: 110_000},
@@ -28,152 +21,143 @@ Dux.from_list([
 |> Dux.collect()
 ```
 
-## Key concepts
-
-### Everything is lazy
+## Everything is lazy
 
-Operations accumulate in the `%Dux{}` struct. Nothing hits DuckDB until you call `compute/1`, `collect/1`, or `to_columns/1`:
+Operations accumulate in the `%Dux{}` struct. Nothing hits DuckDB until you materialize:
 
 ```elixir
 df =
   Dux.from_query("SELECT * FROM range(1, 101) t(x)")
   |> Dux.filter(x > 50)
   |> Dux.mutate(doubled: x * 2)
 
-# Inspect the lazy ops — no SQL has run yet
 df.ops
 ```
 
 ```elixir
-# NOW the SQL runs
 Dux.collect(df)
 ```
 
-This lets DuckDB see the full pipeline and optimize across operations.
+## Expressions compile to SQL
 
-### Expressions compile to SQL
-
-`require Dux` enables the macro versions of `filter`, `mutate`, and `summarise`. Bare identifiers become column names. Use `^` to interpolate Elixir values:
+Bare identifiers become column names. Use `^` to interpolate Elixir values as parameter bindings:
 
 ```elixir
-require Dux
-
 min_price = 50
 
-df
-|> Dux.filter(price > ^min_price and category == "Electronics")
-|> Dux.mutate(with_tax: price * 1.08, upper_name: upper(name))
-|> Dux.summarise(total: sum(with_tax), n: count(name))
+Dux.from_list([
+  %{name: "Widget", price: 25, category: "Tools"},
+  %{name: "Gadget", price: 75, category: "Electronics"},
+  %{name: "Doohickey", price: 100, category: "Electronics"}
+])
+|> Dux.filter(price > ^min_price)
+|> Dux.mutate(with_tax: price * 1.08)
+|> Dux.collect()
 ```
 
-> #### SQL injection is impossible {: .tip}
->
-> `^` interpolations become parameter bindings (`$1`, `$2`, ...) in the generated SQL.
-> User values never appear in the SQL string.
-
-### The `_with` variants
-
-For programmatic use, the `_with` variants accept raw SQL strings:
+The `_with` variants accept raw SQL strings for programmatic use:
 
 ```elixir
-Dux.filter_with(df, "price > 50 AND category = 'Electronics'")
-Dux.mutate_with(df, total: "price * quantity")
-Dux.summarise_with(df, avg_price: "AVG(price)")
+Dux.from_query("SELECT * FROM range(1, 11) t(x)")
+|> Dux.filter_with("x > 5")
+|> Dux.mutate_with(squared: "x * x")
+|> Dux.collect()
 ```
 
 ## Reading and writing data
 
-### CSV
-
 ```elixir
-df = Dux.from_csv("sales.csv")
-df = Dux.from_csv("sales.tsv", delimiter: "\t")
-
-Dux.to_csv(df, "output.csv")
-```
-
-### Parquet
+path = Path.join(System.tmp_dir!(), "dux_guide.csv")
 
-```elixir
-df = Dux.from_parquet("data.parquet")
-df = Dux.from_parquet("data/**/*.parquet")  # glob patterns
+Dux.from_list([
+  %{name: "Alice", score: 85},
+  %{name: "Bob", score: 92},
+  %{name: "Carol", score: 78}
+])
+|> Dux.to_csv(path)
 
-Dux.to_parquet(df, "output.parquet", compression: :zstd)
+Dux.from_csv(path)
+|> Dux.filter(score > 80)
+|> Dux.collect()
 ```
 
-### NDJSON
-
 ```elixir
-df = Dux.from_ndjson("events.ndjson")
-Dux.to_ndjson(df, "output.ndjson")
-```
+parquet_path = Path.join(System.tmp_dir!(), "dux_guide.parquet")
 
-### Remote sources
+Dux.from_query("SELECT x AS id, x * 10 AS value FROM range(1000) t(x)")
+|> Dux.to_parquet(parquet_path, compression: :zstd)
 
-DuckDB extensions handle S3, HTTP, databases — no separate libraries:
-
-```elixir
-# S3 via httpfs extension
-Dux.Connection.load_extension(:httpfs)
-df = Dux.from_parquet("s3://my-bucket/data/*.parquet")
-
-# PostgreSQL via postgres_scanner
-Dux.Connection.load_extension(:postgres_scanner)
-df = Dux.from_query("SELECT * FROM postgres_scan('dbname=mydb', 'users')")
+Dux.from_parquet(parquet_path)
+|> Dux.filter(value > 5000)
+|> Dux.summarise(total: sum(value), n: count(id))
+|> Dux.collect()
 ```
 
 ## Aggregation
 
-Group and aggregate with `group_by` + `summarise`:
-
 ```elixir
-require Dux
-
-Dux.from_csv("orders.csv")
-|> Dux.group_by(:product)
+Dux.from_list([
+  %{region: "US", product: "Widget", amount: 100},
+  %{region: "US", product: "Gadget", amount: 200},
+  %{region: "EU", product: "Widget", amount: 150},
+  %{region: "EU", product: "Gadget", amount: 300},
+  %{region: "US", product: "Widget", amount: 175}
+])
+|> Dux.group_by([:region, :product])
 |> Dux.summarise(
-  total_revenue: sum(price * quantity),
-  order_count: count(id),
-  avg_price: avg(price)
+  total: sum(amount),
+  orders: count(amount),
+  avg_order: avg(amount)
 )
-|> Dux.sort_by(desc: :total_revenue)
+|> Dux.sort_by(desc: :total)
 |> Dux.collect()
 ```
 
 ## Joins
 
 ```elixir
-orders = Dux.from_csv("orders.csv")
-customers = Dux.from_csv("customers.csv")
+orders =
+  Dux.from_list([
+    %{order_id: 1, customer_id: 10, product_id: 100, qty: 5},
+    %{order_id: 2, customer_id: 10, product_id: 101, qty: 3},
+    %{order_id: 3, customer_id: 11, product_id: 100, qty: 2}
+  ])
+
+customers =
+  Dux.from_list([
+    %{customer_id: 10, name: "Alice"},
+    %{customer_id: 11, name: "Bob"}
+  ])
+
+products =
+  Dux.from_list([
+    %{product_id: 100, product_name: "Widget", unit_price: 25},
+    %{product_id: 101, product_name: "Gadget", unit_price: 50}
+  ])
 
 orders
 |> Dux.join(customers, on: :customer_id)
-|> Dux.select([:order_id, :customer_name, :total])
+|> Dux.join(products, on: :product_id)
+|> Dux.mutate(total: qty * unit_price)
+|> Dux.group_by(:name)
+|> Dux.summarise(spend: sum(total), orders: count(order_id))
+|> Dux.sort_by(:name)
 |> Dux.collect()
 ```
 
-Join types: `:inner` (default), `:left`, `:right`, `:cross`, `:anti`, `:semi`.
-
-For columns with different names:
-
-```elixir
-Dux.join(orders, products, on: [{:product_id, :id}])
-```
-
-## Debugging with sql_preview
-
-See the generated SQL without executing:
+## See the generated SQL
 
 ```elixir
-Dux.from_csv("data.csv")
-|> Dux.filter(x > 10)
-|> Dux.mutate(y: x * 2)
+Dux.from_query("SELECT * FROM range(100) t(x)")
+|> Dux.filter(x > 50)
+|> Dux.mutate(doubled: x * 2)
+|> Dux.group_by(:doubled)
+|> Dux.summarise(n: count(x))
 |> Dux.sql_preview()
-# "WITH\n  __s0 AS (SELECT * FROM ...)\n  __s1 AS (...)\nSELECT * FROM __s1"
 ```
 
 ## Next steps
 
-- [Distributed Queries](distributed-queries.md) — run Dux across a BEAM cluster
-- [Graph Analytics](graph-analytics.md) — PageRank, shortest paths, and more
-- [API Reference](Dux.html) — full module documentation
+* [Distributed Queries](distributed-queries.livemd) — run Dux across a BEAM cluster
+* [Graph Analytics](graph-analytics.livemd) — PageRank, shortest paths, and more
+* [API Reference](https://hexdocs.pm/dux/Dux.html) — full module documentation
diff --git a/guides/graph-analytics.livemd b/guides/graph-analytics.livemd
@@ -1,12 +1,7 @@
 # Graph Analytics
 
-## Setup
-
 ```elixir
 Mix.install([{:dux, "~> 0.1.0"}])
-```
-
-```elixir
 require Dux
 ```