Find the longest-running paths through your dbt DAG — the models that actually make your pipeline slow.
When you pay for compute by the second (Snowflake, Databricks, Redshift), your dbt job's wall-clock cost is bounded by the critical path through the DAG: the longest cumulative chain of model execution times. Optimizing a slow model on a short branch saves you nothing if a longer branch was already the bottleneck. dbt-dag-opt tells you which paths to cut first.
pip install dbt-dag-optdbt-dag-opt analyze \
--manifest target/manifest.json \
--run-results target/run_results.json \
--format table \
--top 10export DBT_CLOUD_TOKEN=dbtu_...
dbt-dag-opt analyze \
--account-id 12345 \
--job-id 67890 \
--base-url https://cloud.getdbt.com \
--format tableAdd --run-id <id> to pull artifacts from a specific historical run instead of the job's latest.
Longest paths by total execution time
┏━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ # ┃ Source ┃ End of path ┃ Length ┃ Total time (s) ┃
┡━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ 1 │ source.demo.raw.orders │ model.demo.fact_orders │ 4 │ 35.00 │
│ 2 │ source.demo.raw.customers │ model.demo.fact_orders │ 4 │ 32.00 │
└───┴───────────────────────────┴────────────────────────┴────────┴────────────────┘
dbt-dag-opt analyze [OPTIONS]
--manifest PATH Path to manifest.json (file mode)
--run-results PATH Path to run_results.json (file mode)
--account-id TEXT dbt Cloud account id (cloud mode)
--job-id TEXT dbt Cloud job id (cloud mode)
--run-id TEXT dbt Cloud run id; omit for the job's latest run
--base-url TEXT dbt Cloud base URL [default: https://cloud.getdbt.com]
--token TEXT dbt Cloud API token [env: DBT_CLOUD_TOKEN]
-f, --format [json|jsonl|table] Output format [default: table]
-n, --top INTEGER Show only top N paths (0 = all) [default: 10]
-o, --output PATH Write output to a file instead of stdout
table— rich terminal table (default; what you want in a shell).json— one object keyed by source:{source_id: {path, distance, length}}. Valid JSON, safe to pipe throughjq.jsonl— one JSON object per line. Nice for streaming into a log aggregator.
- Load
manifest.jsonandrun_results.json(from disk or dbt Cloud's Admin API). - Build a weighted DAG: nodes are
model.*/source.*/seed.*/snapshot.*ids; each node's weight is itsexecution_timein seconds. - Compute the longest path from each source using an iterative DP over topological order (O(V + E)).
- Sort paths by total distance and surface the heaviest ones.
Distances sum the execution time of every node along the path — that's the warehouse-seconds you'd save by zeroing out that chain.
It is a CLI tool that points at the slowest chains in your DAG.
It isn't (yet):
- A scheduler simulator. If your dbt
threadssetting is low, total wall-clock is bounded by parallelism and the critical path; v0.2 will surface both. For now, treat the critical-path distance as a lower bound. - A cost model. Multiplying distance × your warehouse rate is on you — a
--warehouse-sizeflag is planned for v0.3.
uv sync --all-extras
uv run ruff check .
uv run mypy src
uv run pytestApache 2.0 — see LICENSE.