Skip to content

Commit d5a5b00

Browse files
authored
Merge pull request #7 from trouze/feat/demo-fixture
demo: add demo_project fixture + scripts/demo.sh walkthrough
2 parents 5d912c1 + aca0d71 commit d5a5b00

8 files changed

Lines changed: 2277 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ All notable changes to this project will be documented in this file. The format
88

99
- `replay` cost overlay: pass `--warehouse-size` (Snowflake XS…6XL) or `--credits-per-hour` (non-Snowflake adapters) to translate wall-clock into dollars. Renders **Run cost**, **Critical-path floor**, **Headroom** (= run − floor; the prize for better parallelization), and **Idle cost** (the $ equivalent of thread-idle warehouse-seconds). Defaults to $2.00/credit (Snowflake Standard On-Demand); override with `--rate-per-credit`. Snowflake's 60-second minimum-billing floor is applied automatically; pass `--no-minimum-billing` to see raw wall-clock × rate.
1010
- New module `dbt_dag_opt.cost` with `CostInputs`, `CostReport`, `compute_cost()`, `credits_per_hour_for()`, and `cost_inputs_from_replay()`. Designed primitive-first so a future `whatif` simulator can call `compute_cost` against simulated schedules and diff the resulting `CostReport`s.
11+
- `scripts/demo.sh` + `tests/fixtures/demo_project/` — narrated end-to-end demo script driving every subcommand against a synthetic 24-model DAG with a shared bottleneck, 4 threads, and ~7.5-min wall-clock. Fixture is regenerable via `tests/fixtures/generate_demo_fixture.py`.
1112

1213
## [0.1.0] - 2026-04-24
1314

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,16 @@ It **is** a CLI tool that points at the slowest chains in your DAG, reconstructs
139139
It **isn't** (yet):
140140
- A predictive scheduler simulator. `replay` reconstructs what already happened; it doesn't yet project what would happen under a different `--threads N` or if you sped up a specific model. That "what-if" loop is planned next, and will diff two cost reports to show projected $ savings.
141141

142+
## Demo
143+
144+
An end-to-end walkthrough you can record or run locally:
145+
146+
```bash
147+
./scripts/demo.sh
148+
```
149+
150+
Drives every subcommand (`analyze`, `analyze --show-path`, `replay`, `replay --warehouse-size L/XL`, `--credits-per-hour` for non-Snowflake, JSON + `jq`) against a synthetic 24-model baseball-analytics DAG in `tests/fixtures/demo_project/`. Set `PAUSE=0` to dry-run without narration beats.
151+
142152
## Development
143153

144154
```bash
@@ -148,6 +158,12 @@ uv run mypy src
148158
uv run pytest
149159
```
150160

161+
Regenerate the demo fixture after editing its topology:
162+
163+
```bash
164+
uv run python tests/fixtures/generate_demo_fixture.py
165+
```
166+
151167
## License
152168

153169
Apache 2.0 — see [LICENSE](LICENSE).

scripts/demo.sh

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
#!/usr/bin/env bash
2+
# Demo script for dbt-dag-opt. Designed to be recorded (asciinema / QuickTime).
3+
#
4+
# Runs against a synthetic 24-model dbt project under tests/fixtures/demo_project/
5+
# — a baseball analytics warehouse with 4 threads, ~7.5 min wall-clock, and one
6+
# shared bottleneck (int_game_events) sitting on three of the top longest paths.
7+
#
8+
# Usage:
9+
# ./scripts/demo.sh
10+
#
11+
# Each command is echoed in bold before it runs, with a narration hint above.
12+
# Pause between commands by setting PAUSE=2 (default) or call with PAUSE=0 to
13+
# rush through for a dry-run.
14+
15+
set -euo pipefail
16+
17+
PAUSE="${PAUSE:-2}"
18+
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
19+
MANIFEST="$ROOT/tests/fixtures/demo_project/manifest.json"
20+
RUN_RESULTS="$ROOT/tests/fixtures/demo_project/run_results.json"
21+
22+
bold() { printf "\033[1m%s\033[0m\n" "$*"; }
23+
dim() { printf "\033[2m%s\033[0m\n" "$*"; }
24+
section() {
25+
echo
26+
printf "\033[1;36m▌ %s\033[0m\n" "$*"
27+
echo
28+
}
29+
run() {
30+
bold "\$ $*"
31+
sleep "$PAUSE"
32+
eval "$@"
33+
echo
34+
sleep "$PAUSE"
35+
}
36+
37+
section "1 · Which paths through the DAG are actually slow?"
38+
dim "analyze uses manifest + run_results to compute the critical path — the"
39+
dim "longest cumulative chain of model execution times. That's the bound on"
40+
dim "how fast your pipeline could possibly run."
41+
run "uv run dbt-dag-opt analyze --manifest \"$MANIFEST\" --run-results \"$RUN_RESULTS\" --top 5"
42+
43+
section "2 · The Bottleneck column names the slowest model on each path"
44+
dim "Watch for a model that appears as the bottleneck on MULTIPLE rows — that's"
45+
dim "shared-node leverage. Optimizing it speeds up several paths at once."
46+
47+
section "3 · Drill into the full chain with --show-path"
48+
run "uv run dbt-dag-opt analyze --manifest \"$MANIFEST\" --run-results \"$RUN_RESULTS\" --top 3 --show-path"
49+
50+
section "4 · What actually happened? (replay reconstructs the observed schedule)"
51+
dim "replay reads thread_id + timing from run_results to reconstruct the"
52+
dim "per-thread Gantt, identify the observed critical path, and attribute"
53+
dim "every idle gap to the upstream model a thread was waiting on."
54+
run "uv run dbt-dag-opt replay --manifest \"$MANIFEST\" --run-results \"$RUN_RESULTS\" --top-idle-gaps 5"
55+
56+
section "5 · Put a price on it: --warehouse-size translates wall-clock to dollars"
57+
dim "Four framed numbers:"
58+
dim " • Run cost — what this run billed"
59+
dim " • Critical-path floor — the irreducible cost of your slowest chain"
60+
dim " • Headroom — run − floor; prize for better parallelization"
61+
dim " • Idle cost — \$ equivalent of thread-idle warehouse-seconds"
62+
run "uv run dbt-dag-opt replay --manifest \"$MANIFEST\" --run-results \"$RUN_RESULTS\" --warehouse-size L --top-idle-gaps 3"
63+
64+
section "6 · Change the warehouse, change the bill (same run, XL)"
65+
dim "Doubling warehouse size doubles the rate. Same wall-clock, 2x cost."
66+
run "uv run dbt-dag-opt replay --manifest \"$MANIFEST\" --run-results \"$RUN_RESULTS\" --warehouse-size XL --top-idle-gaps 0"
67+
68+
section "7 · Non-Snowflake adapters: pass --credits-per-hour directly"
69+
dim "Databricks, BigQuery, Redshift — pass the cost/hour your adapter charges."
70+
run "uv run dbt-dag-opt replay --manifest \"$MANIFEST\" --run-results \"$RUN_RESULTS\" --credits-per-hour 12 --rate-per-credit 1.5 --top-idle-gaps 0"
71+
72+
section "8 · Machine-readable: --format json"
73+
dim "Everything in the text output is also in JSON — pipe to jq for dashboards,"
74+
dim "Slack alerts, or CI annotations."
75+
run "uv run dbt-dag-opt replay --manifest \"$MANIFEST\" --run-results \"$RUN_RESULTS\" --warehouse-size L --format json | jq '.cost'"
76+
77+
section "Wrap"
78+
dim "Three takeaways from this run:"
79+
dim " 1. int_game_events is the shared bottleneck on 3 of the top 5 paths."
80+
dim " 2. 5% of the bill is pure parallelism headroom (small — DAG is well-shaped)."
81+
dim " 3. 30% of warehouse-seconds are idle threads — you're overprovisioned"
82+
dim " on thread count for this DAG shape. Consider --threads 2 next run."
83+
echo
84+
bold "pip install dbt-dag-opt"
85+
echo

tests/conftest.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
FIXTURES_DIR = Path(__file__).parent / "fixtures"
1212
DBT_DUGOUT_DIR = FIXTURES_DIR / "dbt_dugout"
13+
DEMO_PROJECT_DIR = FIXTURES_DIR / "demo_project"
1314

1415

1516
@pytest.fixture
@@ -52,6 +53,27 @@ def dbt_dugout_artifacts(
5253
return DagArtifacts(manifest=manifest, run_results=run_results)
5354

5455

56+
@pytest.fixture
57+
def demo_project_manifest_path() -> Path:
58+
return DEMO_PROJECT_DIR / "manifest.json"
59+
60+
61+
@pytest.fixture
62+
def demo_project_run_results_path() -> Path:
63+
return DEMO_PROJECT_DIR / "run_results.json"
64+
65+
66+
@pytest.fixture
67+
def demo_project_artifacts(
68+
demo_project_manifest_path: Path, demo_project_run_results_path: Path
69+
) -> DagArtifacts:
70+
with demo_project_manifest_path.open() as fh:
71+
manifest = json.load(fh)
72+
with demo_project_run_results_path.open() as fh:
73+
run_results = json.load(fh)
74+
return DagArtifacts(manifest=manifest, run_results=run_results)
75+
76+
5577
def _phase(started: str, completed: str, name: str = "execute") -> dict[str, str]:
5678
return {"name": name, "started_at": started, "completed_at": completed}
5779

0 commit comments

Comments
 (0)