Skip to content

Rewrite as production-ready Python package (v0.1.0 on PyPI)#1

Merged
trouze merged 1 commit into
mainfrom
pypi-v0.1.0
Apr 24, 2026
Merged

Rewrite as production-ready Python package (v0.1.0 on PyPI)#1
trouze merged 1 commit into
mainfrom
pypi-v0.1.0

Conversation

@trouze
Copy link
Copy Markdown
Owner

@trouze trouze commented Apr 24, 2026

Summary

Replaces the prototype with a proper src/-layout Python package intended to ship on PyPI as dbt-dag-opt. The package name is available; first release will be v0.1.0.

  • CLI: Typer-based dbt-dag-opt analyze with file-mode (--manifest/--run-results) and dbt Cloud mode (--account-id/--job-id/DBT_CLOUD_TOKEN), plus --format table|json|jsonl, --top N, --output, and --run-id for historical runs.
  • Algorithm: single iterative DP over topological order replaces the per-source recursive DFS + ProcessPoolExecutor. O(V+E) across all sources. Removes the 20s-per-task timeout ceiling and the recursion-limit risk.
  • Correctness fixes carried from the prototype:
    • Node weights are now attached to the target of each hop (prototype used parent-weight on outgoing edges — silently wrong for any branching DAG).
    • if node not in self.weights bug (compared node ids against a list of floats) is gone with the rewrite.
    • Output is now valid JSON (single object keyed by source) instead of the prior append-mode stream of comma-joined {key: value} fragments.
  • Testing: 29 pytest cases across artifact loaders (incl. responses-mocked Cloud API), graph building, longest-path correctness on diamonds / linear chains / cycles / single-node / missing-run-result, formatters, and the CLI via typer.testing.CliRunner.
  • Tooling: ruff, mypy --strict, py.typed, hatchling build backend, uv.lock committed for reproducible CI.
  • CI/CD: ci.yml matrix on Python 3.10 / 3.11 / 3.12 (lint + type + test + coverage). publish.yml is tag-triggered and uses PyPI Trusted Publishing (OIDC) — no long-lived tokens in repo secrets.
  • Docs: rewritten README with install, quickstart, CLI reference, sample output, and explicit out-of-scope notes for v0.1.0. CHANGELOG.md documents the upgrade.

Deferred to later versions

  • v0.2: thread-aware scheduler simulation. Real wall-clock is bounded by both critical path and threads; v0.1 treats the critical-path distance as a lower bound.
  • v0.3: warehouse cost model (--warehouse-size, --rate $/hr → projected $ savings for Snowflake-style billing).
  • multi-run averaging (stabilize weights across N recent runs), graphviz / mermaid export.

Test plan

  • uv run ruff check . — clean
  • uv run mypy src — clean under --strict
  • uv run pytest — 29/29 passing
  • uv build + uv run --with twine twine check dist/* — both sdist and wheel pass
  • uv run dbt-dag-opt --versiondbt-dag-opt 0.1.0
  • uv run dbt-dag-opt analyze --manifest tests/fixtures/tiny_manifest.json --run-results tests/fixtures/tiny_run_results.json --format table — top path is raw.orders → stg_orders → int_orders → fact_orders at 35.00s, matching hand-verified expectation
  • Manual before tagging v0.1.0: run against a real manifest.json + run_results.json from a past dbt project and eyeball top paths
  • Before first tag push: configure PyPI Trusted Publisher (Project → Publishing → GitHub → trouze/dbt-dag-opt / publish.yml / environment pypi)

🤖 Generated with Claude Code

Replace the prototype with a proper `src/`-layout package, published
as `dbt-dag-opt` on PyPI.

- Typer-based CLI with `analyze` subcommand, `--format table|json|jsonl`,
  `--top N`, `--output`, and both file-mode and dbt Cloud Admin API-mode
  input. `DBT_CLOUD_TOKEN` env var preferred over `--token`.
- Replace per-source recursive DFS + ProcessPoolExecutor with a single
  iterative DP over topological order (O(V+E) across all sources, no
  recursion limit or 20s/task timeout).
- Node weights are now attached to the target of each hop; fixes the
  buggy parent-weight-on-outgoing-edge logic and the
  `if node not in self.weights` membership check that compared node ids
  against float values.
- Output JSON is now valid (single object keyed by source) instead of
  the prior append-mode stream of comma-joined fragments.
- Typed exceptions, `py.typed` marker, full type hints, mypy --strict
  clean.
- pytest suite (29 tests) covering loaders, graph build, longest-path
  correctness on diamonds/chains/cycles, formatters, and the Typer CLI
  via `CliRunner`. Cloud mode tests mock `requests` via `responses`.
- ruff + mypy configured in pyproject.toml.
- GitHub Actions: `ci.yml` (3.10/3.11/3.12 matrix, lint+type+test),
  `publish.yml` (tag-triggered PyPI publish via Trusted Publishing /
  OIDC, no long-lived token in secrets).
- Rewritten README with badges, install, CLI reference, sample output,
  and explicit scope ("is not a scheduler simulator or cost model —
  yet"). CHANGELOG.md with v0.1.0 entry.

Deferred to later versions: thread-aware scheduler simulation (v0.2),
warehouse cost modeling (v0.3), multi-run averaging, graphviz export.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trouze trouze merged commit 765b969 into main Apr 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant