Skip to content

Commit c15e501

Browse files
committed
docs: add CLI reference and end-to-end walkthrough
Add comprehensive CLI documentation to README with option tables for all 3 commands (analyze, diff, mcp) and a 6-step e2e walkthrough using a new examples/cli_e2e/ directory with v1/v2 SQL files demonstrating analyze, JSON/DOT output, diff, and MCP server setup.
1 parent 0693853 commit c15e501

7 files changed

Lines changed: 259 additions & 0 deletions

File tree

README.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -788,6 +788,203 @@ The server exposes all clgraph lineage tools:
788788
| `pipeline://tables` | List of all tables with metadata |
789789
| `pipeline://tables/{name}` | Detailed info for a specific table |
790790

791+
## CLI Reference
792+
793+
clgraph ships a command-line interface for analysing SQL lineage without writing Python.
794+
795+
```
796+
clgraph [COMMAND] [OPTIONS]
797+
```
798+
799+
### `clgraph analyze`
800+
801+
Parse SQL files and display a column-lineage summary.
802+
803+
```bash
804+
clgraph analyze PATH [--dialect DIALECT] [--format table|json|dot]
805+
```
806+
807+
| Option | Default | Description |
808+
|--------|---------|-------------|
809+
| `PATH` | *(required)* | SQL file, directory of `.sql` files, or JSON pipeline file |
810+
| `--dialect` | `bigquery` | SQL dialect (bigquery, snowflake, postgres, mysql, duckdb, clickhouse, …) |
811+
| `--format`, `-f` | `table` | Output format: **table** (Rich table), **json** (machine-readable), **dot** (Graphviz) |
812+
813+
### `clgraph diff`
814+
815+
Compare lineage between two pipeline versions — useful for reviewing the impact of SQL changes in PRs.
816+
817+
```bash
818+
clgraph diff OLD_PATH NEW_PATH [--dialect DIALECT] [--format table|json]
819+
```
820+
821+
| Option | Default | Description |
822+
|--------|---------|-------------|
823+
| `OLD_PATH` | *(required)* | Path to old SQL file or directory |
824+
| `NEW_PATH` | *(required)* | Path to new SQL file or directory |
825+
| `--dialect` | `bigquery` | SQL dialect |
826+
| `--format`, `-f` | `table` | Output format: **table** or **json** |
827+
828+
### `clgraph mcp`
829+
830+
Start an MCP server so AI assistants (Claude Desktop, Cursor, etc.) can query your lineage graph.
831+
832+
```bash
833+
clgraph mcp --pipeline PATH [--dialect DIALECT] [--transport stdio|http] [--no-llm-tools]
834+
```
835+
836+
| Option | Default | Description |
837+
|--------|---------|-------------|
838+
| `--pipeline`, `-p` | *(required)* | Path to SQL directory or JSON pipeline file |
839+
| `--dialect` | `bigquery` | SQL dialect |
840+
| `--transport` | `stdio` | Transport type: **stdio** (Claude Desktop) or **http** (remote clients) |
841+
| `--no-llm-tools` | `false` | Exclude LLM-dependent tools from the server |
842+
843+
Requires: `pip install clgraph[mcp]`
844+
845+
---
846+
847+
## End-to-End CLI Walkthrough
848+
849+
This walkthrough uses the example files in [`examples/cli_e2e/`](examples/cli_e2e/).
850+
The pipeline has three SQL files: `users`, `orders`, and a `user_spend` mart.
851+
852+
### Step 1 — Analyze the pipeline
853+
854+
```bash
855+
$ clgraph analyze examples/cli_e2e/v1/
856+
```
857+
858+
```
859+
Pipeline Tables
860+
┏━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┓
861+
┃ Table ┃ Type ┃ Columns ┃ Upstream ┃ Downstream ┃
862+
┡━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━┩
863+
│ users │ derived │ 4 │ 1 │ 1 │
864+
│ source_users │ source │ 4 │ 0 │ 1 │
865+
│ orders │ derived │ 4 │ 1 │ 1 │
866+
│ source_orders │ source │ 4 │ 0 │ 1 │
867+
│ user_spend │ derived │ 7 │ 2 │ 0 │
868+
└───────────────┴─────────┴─────────┴──────────┴────────────┘
869+
870+
5 tables, 23 columns, 15 lineage edges
871+
```
872+
873+
clgraph discovered 5 tables (2 sources, 2 intermediate, 1 final mart) and traced 15 column-level lineage edges — all from static SQL analysis.
874+
875+
### Step 2 — Get JSON output for CI/scripts
876+
877+
```bash
878+
$ clgraph analyze examples/cli_e2e/v1/ --format json
879+
```
880+
881+
```json
882+
{
883+
"dialect": "bigquery",
884+
"tables": [
885+
{
886+
"name": "users",
887+
"is_source": false,
888+
"columns": [
889+
{"name": "user_id", "type": "direct_column", "pii": false},
890+
{"name": "email", "type": "direct_column", "pii": true},
891+
{"name": "signup_date", "type": "direct_column", "pii": false},
892+
{"name": "country", "type": "direct_column", "pii": false}
893+
]
894+
}
895+
],
896+
"columns": 23,
897+
"edges": 15,
898+
"issues": 0
899+
}
900+
```
901+
902+
Notice that `email` already has `"pii": true` — parsed from the `[pii: true]` comment in the SQL.
903+
904+
### Step 3 — Generate a Graphviz diagram
905+
906+
```bash
907+
$ clgraph analyze examples/cli_e2e/v1/ --format dot | dot -Tpng -o lineage.png
908+
```
909+
910+
The DOT output is a standard Graphviz `digraph` that shows table dependencies:
911+
912+
```dot
913+
digraph {
914+
rankdir=LR
915+
source_users -> users [label=CREATE]
916+
source_orders -> orders [label=CREATE]
917+
users -> user_spend [label=CREATE]
918+
orders -> user_spend [label=CREATE]
919+
}
920+
```
921+
922+
### Step 4 — Diff two versions to review impact
923+
924+
Now suppose a teammate adds a `tier` column to users, a `discount` column to orders,
925+
and updates `user_spend` to compute `lifetime_net_spend`. Compare old vs new:
926+
927+
```bash
928+
$ clgraph diff examples/cli_e2e/v1/ examples/cli_e2e/v2/
929+
```
930+
931+
```
932+
+6 columns added
933+
+ source_orders.discount
934+
+ orders.discount
935+
+ user_spend.tier
936+
+ users.tier
937+
+ user_spend.lifetime_net_spend
938+
+ source_users.tier
939+
```
940+
941+
The diff tells you exactly which columns were added, removed, or modified across the entire pipeline — perfect for code review or CI gates.
942+
943+
### Step 5 — Get diff as JSON for automation
944+
945+
```bash
946+
$ clgraph diff examples/cli_e2e/v1/ examples/cli_e2e/v2/ --format json
947+
```
948+
949+
```json
950+
{
951+
"columns_added": [
952+
"source_orders.discount",
953+
"orders.discount",
954+
"source_users.tier",
955+
"user_spend.lifetime_net_spend",
956+
"users.tier",
957+
"user_spend.tier"
958+
],
959+
"columns_removed": [],
960+
"columns_modified": [],
961+
"has_changes": true
962+
}
963+
```
964+
965+
### Step 6 — Serve lineage to AI via MCP
966+
967+
```bash
968+
$ clgraph mcp --pipeline examples/cli_e2e/v1/
969+
```
970+
971+
This starts an MCP server on stdio. Connect it to Claude Desktop by adding to your config:
972+
973+
```json
974+
{
975+
"mcpServers": {
976+
"clgraph": {
977+
"command": "clgraph",
978+
"args": ["mcp", "--pipeline", "/path/to/your/sql/"]
979+
}
980+
}
981+
}
982+
```
983+
984+
Then ask Claude: *"What tables does user_spend depend on?"* or *"Which columns contain PII?"*
985+
986+
---
987+
791988
## Architecture
792989

793990
> 📊 **[View the complete architecture diagram](clgraph-simple-diagram.md)** - A visual overview of the 4-stage flow from SQL input to applications.

examples/cli_e2e/v1/01_users.sql

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
-- Raw users from source
2+
CREATE OR REPLACE TABLE users AS
3+
SELECT
4+
user_id, -- Unique user identifier [owner: data-platform]
5+
email, -- User email [pii: true, owner: data-governance]
6+
signup_date, -- When user signed up [owner: growth]
7+
country -- User country [owner: growth]
8+
FROM source_users

examples/cli_e2e/v1/02_orders.sql

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
-- Raw orders from source
2+
CREATE OR REPLACE TABLE orders AS
3+
SELECT
4+
order_id, -- Unique order identifier [owner: data-platform]
5+
user_id, -- Reference to user [owner: data-platform]
6+
order_date, -- Date order was placed [owner: finance]
7+
total_amount -- Total order amount [owner: finance, tags: metric revenue]
8+
FROM source_orders
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
-- Mart: user lifetime spend
2+
CREATE OR REPLACE TABLE user_spend AS
3+
SELECT
4+
u.user_id,
5+
u.email,
6+
u.country,
7+
COUNT(o.order_id) AS total_orders,
8+
SUM(o.total_amount) AS lifetime_spend,
9+
MIN(o.order_date) AS first_order_date,
10+
MAX(o.order_date) AS last_order_date
11+
FROM users u
12+
LEFT JOIN orders o ON u.user_id = o.user_id
13+
GROUP BY u.user_id, u.email, u.country

examples/cli_e2e/v2/01_users.sql

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
-- Raw users from source
2+
CREATE OR REPLACE TABLE users AS
3+
SELECT
4+
user_id, -- Unique user identifier [owner: data-platform]
5+
email, -- User email [pii: true, owner: data-governance]
6+
signup_date, -- When user signed up [owner: growth]
7+
country, -- User country [owner: growth]
8+
tier -- Loyalty tier [owner: growth]
9+
FROM source_users

examples/cli_e2e/v2/02_orders.sql

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
-- Raw orders from source
2+
CREATE OR REPLACE TABLE orders AS
3+
SELECT
4+
order_id, -- Unique order identifier [owner: data-platform]
5+
user_id, -- Reference to user [owner: data-platform]
6+
order_date, -- Date order was placed [owner: finance]
7+
total_amount, -- Total order amount [owner: finance, tags: metric revenue]
8+
discount -- Discount applied [owner: finance, tags: metric]
9+
FROM source_orders
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
-- Mart: user lifetime spend (v2 — adds tier and net spend)
2+
CREATE OR REPLACE TABLE user_spend AS
3+
SELECT
4+
u.user_id,
5+
u.email,
6+
u.country,
7+
u.tier,
8+
COUNT(o.order_id) AS total_orders,
9+
SUM(o.total_amount) AS lifetime_spend,
10+
SUM(o.total_amount - o.discount) AS lifetime_net_spend,
11+
MIN(o.order_date) AS first_order_date,
12+
MAX(o.order_date) AS last_order_date
13+
FROM users u
14+
LEFT JOIN orders o ON u.user_id = o.user_id
15+
GROUP BY u.user_id, u.email, u.country, u.tier

0 commit comments

Comments
 (0)