Skip to content

Commit 2a594d1

Browse files
trouzeclaude
andcommitted
analyze: add --show-path flag + Bottleneck column
Two additions to `analyze`'s table output: 1. `--show-path` flag renders the full chain of node ids for each longest path (previously only source and end were shown). 2. A Bottleneck column names the slowest model on each path and its execution time. Directly answers "which model should I optimize first" — the bottleneck on the #1 path is the first-order target. When the same bottleneck appears across multiple rows it's a shared upstream that several paths depend on; optimizing it benefits all of them (shared-node leverage). The weights are threaded through via the existing Dag; no new data collection. JSON / JSONL outputs are unchanged — they already carry the full path, and adding a redundant bottleneck field would duplicate information derivable from path + weights. Tests: 4 new (2 formatter, 2 CLI). Suite: 46 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8ce1c16 commit 2a594d1

6 files changed

Lines changed: 118 additions & 9 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ All notable changes to this project will be documented in this file. The format
99
- `dbt-dag-opt replay` subcommand: reconstructs the observed schedule from `run_results.json`'s `thread_id` + per-phase `timing` data, joined against `manifest.json`'s `parent_map`. Reports per-thread utilization, observed critical path (walked backwards from the last-completing node), and top idle gaps with parent-node attribution.
1010
- Output formats for `replay`: `text` (rich terminal summary, default) and `json` (full replay report, including raw events).
1111
- Integration fixture at `tests/fixtures/dbt_dugout/` — a real Snowflake dbt run (57 nodes, 4 threads) used to smoke-test `replay` end-to-end.
12+
- `analyze --show-path`: render the full chain of node ids for each longest path in the table output.
13+
- `analyze` table now includes a **Bottleneck** column naming the slowest model on each path. A bottleneck that appears across multiple rows is a shared-node optimization target.
1214

1315
## [0.1.0] - 2026-04-24
1416

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,9 +67,12 @@ dbt-dag-opt analyze [OPTIONS]
6767
--token TEXT dbt Cloud API token [env: DBT_CLOUD_TOKEN]
6868
-f, --format [json|jsonl|table] Output format [default: table]
6969
-n, --top INTEGER Show only top N paths (0 = all) [default: 10]
70+
--show-path Render the full chain of node ids (table format)
7071
-o, --output PATH Write output to a file instead of stdout
7172
```
7273

74+
The table includes a **Bottleneck** column that names the slowest model on each path. First-order optimization target: the bottleneck model on the #1 path. Watch for a bottleneck that repeats across multiple paths — that's shared-node leverage (optimizing one model helps several paths at once).
75+
7376
### `replay` — what actually happened
7477

7578
`analyze` is theoretical — it reports the DAG-structural lower bound on wall-clock. `replay` reads the *observed* schedule. Every result in `run_results.json` carries a `thread_id` and per-phase `timing` with start/end timestamps, so we can reconstruct:

src/dbt_dag_opt/cli.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,13 @@ def analyze(
9191
int,
9292
typer.Option("--top", "-n", help="Show only the top N longest paths. Use 0 for all."),
9393
] = 10,
94+
show_path: Annotated[
95+
bool,
96+
typer.Option(
97+
"--show-path",
98+
help="Render the full chain of node ids in the table (table format only).",
99+
),
100+
] = False,
94101
output: Annotated[
95102
Path | None,
96103
typer.Option("--output", "-o", help="Write output to a file instead of stdout."),
@@ -114,7 +121,9 @@ def analyze(
114121
raise typer.Exit(code=1) from exc
115122

116123
top_value: int | None = top if top > 0 else None
117-
rendered = render(results, fmt, top=top_value)
124+
rendered = render(
125+
results, fmt, top=top_value, show_full_path=show_path, weights=dag.weights
126+
)
118127

119128
if output is not None:
120129
output.write_text(rendered, encoding="utf-8")

src/dbt_dag_opt/formatters.py

Lines changed: 55 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,14 @@ class Format(str, Enum):
1818
TABLE = "table"
1919

2020

21-
def render(results: list[PathResult], fmt: Format, *, top: int | None = None) -> str:
21+
def render(
22+
results: list[PathResult],
23+
fmt: Format,
24+
*,
25+
top: int | None = None,
26+
show_full_path: bool = False,
27+
weights: dict[str, float] | None = None,
28+
) -> str:
2229
ordered = sorted(results, key=lambda r: r.distance, reverse=True)
2330
if top is not None:
2431
ordered = ordered[:top]
@@ -28,7 +35,7 @@ def render(results: list[PathResult], fmt: Format, *, top: int | None = None) ->
2835
if fmt is Format.JSONL:
2936
return _render_jsonl(ordered)
3037
if fmt is Format.TABLE:
31-
return _render_table(ordered)
38+
return _render_table(ordered, show_full_path=show_full_path, weights=weights)
3239
raise ValueError(f"unknown format: {fmt}")
3340

3441

@@ -50,19 +57,59 @@ def _render_jsonl(results: list[PathResult]) -> str:
5057
return "\n".join(lines)
5158

5259

53-
def _render_table(results: list[PathResult]) -> str:
60+
def _render_table(
61+
results: list[PathResult],
62+
*,
63+
show_full_path: bool = False,
64+
weights: dict[str, float] | None = None,
65+
) -> str:
5466
buffer = StringIO()
55-
console = Console(file=buffer, force_terminal=False, width=120)
56-
table = Table(title="Longest paths by total execution time", show_lines=False)
67+
console = Console(file=buffer, force_terminal=False, width=140)
68+
table = Table(
69+
title="Longest paths by total execution time",
70+
show_lines=show_full_path, # row separators help when Path cell wraps
71+
)
5772
table.add_column("#", justify="right", style="dim")
5873
table.add_column("Source", overflow="fold")
59-
table.add_column("End of path", overflow="fold")
74+
if show_full_path:
75+
table.add_column("Path", overflow="fold")
76+
else:
77+
table.add_column("End of path", overflow="fold")
6078
table.add_column("Length", justify="right")
6179
table.add_column("Total time (s)", justify="right", style="bold")
80+
if weights is not None:
81+
table.add_column("Bottleneck (slowest on path)", overflow="fold")
82+
table.add_column("Bottleneck time (s)", justify="right")
6283

6384
for idx, r in enumerate(results, start=1):
64-
end = r.path[-1] if r.path else r.source
65-
table.add_row(str(idx), r.source, end, str(len(r.path)), f"{r.distance:.2f}")
85+
row: list[str] = [str(idx), r.source]
86+
if show_full_path:
87+
row.append(_format_path(r.path))
88+
else:
89+
row.append(r.path[-1] if r.path else r.source)
90+
row.extend([str(len(r.path)), f"{r.distance:.2f}"])
91+
if weights is not None:
92+
node, seconds = _bottleneck(r.path, weights)
93+
row.extend([node, f"{seconds:.2f}"])
94+
table.add_row(*row)
6695

6796
console.print(table)
6897
return buffer.getvalue()
98+
99+
100+
def _format_path(path: list[str]) -> str:
101+
return " → ".join(path) if path else "(empty)"
102+
103+
104+
def _bottleneck(path: list[str], weights: dict[str, float]) -> tuple[str, float]:
105+
"""Return (node_id, seconds) for the heaviest node along this path."""
106+
if not path:
107+
return ("-", 0.0)
108+
best_node = path[0]
109+
best_seconds = weights.get(best_node, 0.0)
110+
for node in path[1:]:
111+
seconds = weights.get(node, 0.0)
112+
if seconds > best_seconds:
113+
best_node = node
114+
best_seconds = seconds
115+
return (best_node, best_seconds)

tests/test_cli.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,31 @@ def test_analyze_from_files_table(
6060
)
6161
assert result.exit_code == 0, result.stdout
6262
assert "source.demo.raw.orders" in result.stdout
63+
# default table now includes the Bottleneck column (weights are passed by the CLI)
64+
assert "Bottleneck" in result.stdout
65+
66+
67+
def test_analyze_show_path_renders_full_chain(
68+
tiny_manifest_path: Path, tiny_run_results_path: Path
69+
) -> None:
70+
result = runner.invoke(
71+
app,
72+
[
73+
"analyze",
74+
"--manifest",
75+
str(tiny_manifest_path),
76+
"--run-results",
77+
str(tiny_run_results_path),
78+
"--format",
79+
"table",
80+
"--show-path",
81+
],
82+
)
83+
assert result.exit_code == 0, result.stdout
84+
# fact_orders is the terminal model in tiny fixture; intermediate stg_orders sits in between
85+
assert "stg_orders" in result.stdout
86+
assert "fact_orders" in result.stdout
87+
assert "→" in result.stdout
6388

6489

6590
def test_analyze_output_file_writes_file(

tests/test_formatters.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,26 @@ def test_render_table_contains_all_sources() -> None:
4040
out = render(_RESULTS, Format.TABLE)
4141
assert "src.a" in out
4242
assert "30.00" in out
43+
44+
45+
def test_render_table_show_full_path_renders_chain() -> None:
46+
out = render(_RESULTS, Format.TABLE, show_full_path=True)
47+
# full chain joined by arrows should appear verbatim for the longest path
48+
assert "src.a → mid.a → end.a" in out
49+
50+
51+
def test_render_table_weights_adds_bottleneck_column() -> None:
52+
weights = {"src.a": 5.0, "mid.a": 20.0, "end.a": 5.0, "src.b": 4.0, "end.b": 6.0, "src.c": 1.0}
53+
out = render(_RESULTS, Format.TABLE, weights=weights)
54+
# Bottleneck of the top path is mid.a at 20s
55+
assert "Bottleneck" in out
56+
assert "mid.a" in out
57+
assert "20.00" in out
58+
59+
60+
def test_render_table_show_full_path_and_weights_together() -> None:
61+
weights = {"src.a": 5.0, "mid.a": 20.0, "end.a": 5.0, "src.b": 4.0, "end.b": 6.0, "src.c": 1.0}
62+
out = render(_RESULTS, Format.TABLE, show_full_path=True, weights=weights)
63+
assert "src.a → mid.a → end.a" in out
64+
assert "mid.a" in out
65+
assert "20.00" in out

0 commit comments

Comments
 (0)