Skip to content

Commit 9a75c67

Browse files
committed
feat(clickhouse): OPS HCL → migration codegen
Generate a ClickHouse migration from a declarative-HCL change. The HCL stays the source of truth for the schema; this produces the run_sql_with_exceptions(...) operations the existing infi.clickhouse_orm runner executes. - topology.py: explicit, reviewed map of each OPS object -> (node_roles, replicated, sharded). node_roles is a deliberate choice (the dump hostClusterRole vocabulary differs from NodeRole and migrations target a curated subset, so it can't be mechanically derived); seeded by introspecting ../clickhouse-schema and reconciled against 0273/0274. - gen_migration.py: runs ops/diff.sh, maps each DDL statement to its targeting, derives sharded / is_alter_on_replicated_table from the engine kind, and emits the operations. One HCL change can span multiple roles. Errors on objects missing from topology.py; flags UNSAFE recreations.
1 parent 9220c23 commit 9a75c67

3 files changed

Lines changed: 282 additions & 0 deletions

File tree

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# OPS HCL → migration codegen
2+
3+
Turns a declarative-HCL change into a ClickHouse migration the existing
4+
`infi.clickhouse_orm` runner executes. The HCL stays the source of truth for
5+
*what* the schema is; this generator produces the `run_sql_with_exceptions(...)`
6+
operations for *applying* it.
7+
8+
## How it works
9+
10+
```
11+
edit OPS HCL ──▶ ops/diff.sh (hclexp diff -sql, committed → working)
12+
──▶ gen_migration.py: map each DDL statement to its targeting
13+
──▶ operations = [run_sql_with_exceptions(...), ...]
14+
```
15+
16+
Two inputs, two responsibilities:
17+
18+
- **`hclexp diff`** gives the DDL (`ALTER TABLE … ADD COLUMN …`, etc.) — derived
19+
from the HCL, so it's always correct for the schema.
20+
- **`topology.py`** gives `node_roles` per object. This is an explicit, reviewed
21+
map because `node_roles` is **not** mechanically derivable: the dump
22+
`hostClusterRole` vocabulary (`ingestion`, `batch_exports`, `sessionsv3`, …)
23+
doesn't match the `NodeRole` enum, and migrations deliberately target a curated
24+
subset. `sharded` and `is_alter_on_replicated_table` are derived from the engine
25+
kind recorded in the map.
26+
27+
The result: one HCL change can emit several operations with **different** roles —
28+
e.g. an OPS replicated data table (`node_roles=[OPS]`,
29+
`is_alter_on_replicated_table=True`) plus its distributed read table everywhere
30+
(`node_roles=ALL_ROLES`, both flags `False`).
31+
32+
## Usage
33+
34+
```bash
35+
# from the repo root; HCLEXP_BIN points at a built hclexp (or rely on the
36+
# bin/hclexp container fallback)
37+
HCLEXP_BIN=../python-clickhouse-schema/hclexp \
38+
python posthog/clickhouse/hcl/ops/codegen/gen_migration.py --name add_foo_column
39+
40+
# --ref <git-ref> diff the working tree against this ref (default HEAD)
41+
# --out <path|-> write here, or stdout (default)
42+
```
43+
44+
Then: review the output, save it as the next `posthog/clickhouse/migrations/NNNN_<name>.py`,
45+
and bump `max_migration.txt`.
46+
47+
## Behaviors / guardrails
48+
49+
- **Unknown object → hard error.** If a changed object isn't in `topology.py`,
50+
generation fails — you must add it (a conscious `node_roles` choice) first.
51+
- **UNSAFE changes are flagged.** Storage-class switches / recreations that
52+
`hclexp` marks `-- UNSAFE` get a `# UNSAFE (review/recreate by hand)` comment;
53+
these are never silently emitted as a plain ALTER.
54+
- Statements are deduped across env stacks. Per-env DDL differences (e.g.
55+
`sharded_tophog` zoo_path) still need `settings.CLOUD_DEPLOYMENT` gating by hand —
56+
the generator does not yet emit those branches.
57+
- DDL keeps the explicit `posthog.` database qualifier as `hclexp` emits it.
58+
59+
## Keeping topology.py current
60+
61+
Seeded by introspecting `../clickhouse-schema` (which roles host each object) and
62+
reconciled against the existing OPS migrations (`0273`/`0274`). When you add or
63+
move an OPS object, add/adjust its entry: `(node_roles, replicated, sharded)`.
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
#!/usr/bin/env python3
2+
"""Generate a ClickHouse migration from an OPS declarative-HCL change.
3+
4+
Pipeline: run the OPS diff (committed HCL -> working tree, via ops/diff.sh) to
5+
get the DDL `hclexp` would apply, map each statement to its node-role targeting
6+
using topology.py, and emit a migration whose `operations` are
7+
run_sql_with_exceptions(...) calls ready to drop into
8+
posthog/clickhouse/migrations/.
9+
10+
The HCL supplies *what* (the DDL); topology.py supplies *where* (node_roles); the
11+
engine kind in topology.py supplies sharded / is_alter_on_replicated_table.
12+
13+
Usage (from anywhere; paths resolve relative to this file):
14+
HCLEXP_BIN=../python-clickhouse-schema/hclexp \
15+
python posthog/clickhouse/hcl/ops/codegen/gen_migration.py --name add_demo_column
16+
# options: --ref <git-ref> (default HEAD), --out <path|-> (default stdout)
17+
"""
18+
19+
from __future__ import annotations
20+
21+
import argparse
22+
import os
23+
import re
24+
import subprocess
25+
import sys
26+
27+
HERE = os.path.dirname(os.path.abspath(__file__))
28+
REPO_ROOT = os.path.abspath(os.path.join(HERE, "..", "..", "..", "..", ".."))
29+
DIFF_SH = os.path.join("posthog", "clickhouse", "hcl", "ops", "diff.sh")
30+
31+
sys.path.insert(0, HERE)
32+
from topology import TOPOLOGY # noqa: E402
33+
34+
# op keyword -> (kind, is-alter?). Longer keywords first so the regex is greedy.
35+
_OPS = [
36+
("ALTER TABLE", "ALTER", True),
37+
("CREATE TABLE IF NOT EXISTS", "CREATE", False),
38+
("CREATE TABLE", "CREATE", False),
39+
("CREATE MATERIALIZED VIEW", "CREATE", False),
40+
("CREATE OR REPLACE VIEW", "CREATE", False),
41+
("CREATE VIEW", "CREATE", False),
42+
("DROP TABLE IF EXISTS", "DROP", False),
43+
("DROP TABLE", "DROP", False),
44+
("RENAME TABLE", "RENAME", False),
45+
]
46+
_OP_RE = re.compile(
47+
r"^\s*(" + "|".join(re.escape(k) for k, _, _ in _OPS) + r")\s+`?(?:posthog\.)?`?([A-Za-z0-9_$]+)`?"
48+
)
49+
50+
51+
def run_diff(ref: str) -> str:
52+
env = dict(os.environ)
53+
return subprocess.run(
54+
["bash", DIFF_SH, ref], cwd=REPO_ROOT, env=env, capture_output=True, text=True, check=True
55+
).stdout
56+
57+
58+
def parse_statements(diff_out: str) -> tuple[list[tuple[str, str]], set[str]]:
59+
"""Return (unique [statement, env] pairs in order) and the set of UNSAFE tables.
60+
61+
Statements are accumulated until a line ends with ';' (hclexp may wrap a
62+
CREATE across lines). Section headers from diff.sh set the current env.
63+
"""
64+
statements: list[tuple[str, str]] = []
65+
unsafe: set[str] = set()
66+
seen: set[str] = set()
67+
env = "?"
68+
buf: list[str] = []
69+
for line in diff_out.splitlines():
70+
if line.startswith("# ") and "committed@" in line:
71+
env = line[2:].split()[0]
72+
continue
73+
if line.startswith("==") or not line.strip():
74+
continue
75+
if line.startswith("-- UNSAFE:"):
76+
m = re.search(r"posthog\.([A-Za-z0-9_$]+)", line)
77+
if m:
78+
unsafe.add(m.group(1))
79+
continue
80+
if line.startswith("--"): # "-- no changes" etc.
81+
continue
82+
buf.append(line.rstrip())
83+
if line.rstrip().endswith(";"):
84+
stmt = " ".join(buf).strip()
85+
buf = []
86+
if stmt not in seen:
87+
seen.add(stmt)
88+
statements.append((stmt, env))
89+
return statements, unsafe
90+
91+
92+
def classify(stmt: str) -> tuple[str, str]:
93+
m = _OP_RE.match(stmt)
94+
if not m:
95+
raise SystemExit(f"ERROR: cannot parse op/table from statement:\n {stmt}")
96+
keyword, table = m.group(1), m.group(2)
97+
kind = next(k for kw, k, _ in _OPS if kw == keyword)
98+
return kind, table
99+
100+
101+
def emit_operation(stmt: str, kind: str, table: str) -> str:
102+
if table not in TOPOLOGY:
103+
raise SystemExit(
104+
f"ERROR: object {table!r} is not in topology.py. Add it (node_roles, "
105+
f"replicated, sharded) before generating — node_roles is a deliberate choice."
106+
)
107+
roles, replicated, sharded = TOPOLOGY[table]
108+
is_alter_repl = kind == "ALTER" and replicated
109+
roles_src = "[" + ", ".join(f"NodeRole.{r}" for r in roles) + "]"
110+
sql_lit = stmt[:-1] if stmt.endswith(";") else stmt # the runner appends nothing; keep as-is sans ';'
111+
return (
112+
" run_sql_with_exceptions(\n"
113+
f" {sql_lit!r},\n"
114+
f" node_roles={roles_src},\n"
115+
f" sharded={sharded},\n"
116+
f" is_alter_on_replicated_table={is_alter_repl},\n"
117+
" ),"
118+
)
119+
120+
121+
def main() -> None:
122+
ap = argparse.ArgumentParser()
123+
ap.add_argument("--name", required=True, help="migration slug, e.g. add_demo_column")
124+
ap.add_argument("--ref", default="HEAD", help="git ref to diff the working tree against")
125+
ap.add_argument("--out", default="-", help="output path, or - for stdout")
126+
args = ap.parse_args()
127+
128+
statements, unsafe = parse_statements(run_diff(args.ref))
129+
if not statements:
130+
raise SystemExit("No DDL generated — the OPS HCL has no changes vs the ref.")
131+
132+
ops, warnings = [], []
133+
for stmt, _env in statements:
134+
kind, table = classify(stmt)
135+
if table in unsafe:
136+
warnings.append(f"# UNSAFE (review/recreate by hand): {kind} {table}")
137+
ops.append(emit_operation(stmt, kind, table))
138+
139+
body = (
140+
'"""AUTO-GENERATED from the OPS declarative HCL by '
141+
"posthog/clickhouse/hcl/ops/codegen/gen_migration.py.\n"
142+
"Review node_roles / sharded / is_alter_on_replicated_table before committing.\n"
143+
'"""\n'
144+
"from posthog.clickhouse.client.connection import NodeRole\n"
145+
"from posthog.clickhouse.client.migration_tools import run_sql_with_exceptions\n\n"
146+
)
147+
if warnings:
148+
body += "\n".join(warnings) + "\n\n"
149+
body += "operations = [\n" + "\n".join(ops) + "\n]\n"
150+
151+
if args.out == "-":
152+
sys.stdout.write(body)
153+
else:
154+
with open(args.out, "w") as f:
155+
f.write(body)
156+
sys.stderr.write(f"wrote {args.out}\n")
157+
158+
159+
if __name__ == "__main__":
160+
main()
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
"""Deployment topology for the OPS declarative-HCL → migration generator.
2+
3+
`node_roles` for a ClickHouse object is a deliberate engineering choice, NOT
4+
something mechanically derivable from the dumps: the dump `hostClusterRole`
5+
vocabulary (`ingestion`, `batch_exports`, `sessionsv3`, …) does not match the
6+
`NodeRole` enum, and migrations deliberately target a curated subset rather than
7+
every role that physically hosts an object.
8+
9+
So this map is the explicit source of truth for *where* each OPS-managed object
10+
lives. It was seeded by introspecting ../clickhouse-schema (which roles host each
11+
object) and reconciled against the existing OPS migrations (0273/0274). Keep it
12+
in sync when adding or moving OPS objects — the generator errors on any object it
13+
finds in a diff but not here, forcing a conscious choice.
14+
15+
Per object: (node_roles, replicated, sharded)
16+
node_roles — NodeRole names the migration must target.
17+
replicated — ReplicatedMergeTree family. Drives is_alter_on_replicated_table
18+
(an ALTER runs on one host per shard, replication propagates).
19+
sharded — lives on the multi-shard DATA cluster. Every OPS satellite is
20+
single-shard, so this is False for all current OPS objects.
21+
"""
22+
23+
# Every cluster on the query_log_archive read/write/MV path. Mirrors ALL_ROLES in
24+
# migration 0273. ENDPOINTS is included per migration history even though it is
25+
# not a distinct hostClusterRole in the dumps; the dump also shows ingestion/
26+
# batch_exports/sessionsv3, which the migrations intentionally do not target.
27+
ALL_ROLES = ["DATA", "ENDPOINTS", "AUX", "AI_EVENTS", "SESSIONS", "OPS"]
28+
29+
TOPOLOGY: dict[str, tuple[list[str], bool, bool]] = {
30+
# --- OPS data tables (single-shard, replicated) ---
31+
"sharded_query_log_archive": (["OPS"], True, False),
32+
"sharded_tophog": (["OPS"], True, False),
33+
"events_team_daily_stats": (["OPS"], True, False),
34+
"metrics_exemplars": (["OPS"], True, False),
35+
"metrics_histograms": (["OPS"], True, False),
36+
"metrics_label_index": (["OPS"], True, False),
37+
"metrics_metadata": (["OPS"], True, False),
38+
"metrics_samples": (["OPS"], True, False),
39+
"metrics_series": (["OPS"], True, False),
40+
# --- OPS-only, non-replicated (buffer / MV / distributed proxy / view) ---
41+
"query_log_archive_buffer": (["OPS"], False, False),
42+
"metrics_label_index_from_series_mv": (["OPS"], False, False),
43+
"events_main": (["OPS"], False, False),
44+
"daily_aggregated_query_log_archive": (["OPS"], False, False),
45+
# --- OPS + DATA ---
46+
"events_recent": (["OPS", "DATA"], False, False),
47+
# --- Everywhere: query_log_archive read/write path + custom_metrics views ---
48+
"query_log_archive": (ALL_ROLES, False, False),
49+
"writable_query_log_archive": (ALL_ROLES, False, False),
50+
"ops_query_log_archive_mv": (ALL_ROLES, False, False),
51+
"custom_metrics": (ALL_ROLES, False, False),
52+
"custom_metrics_backups": (ALL_ROLES, False, False),
53+
"custom_metrics_dictionaries": (ALL_ROLES, False, False),
54+
"custom_metrics_part_counts": (ALL_ROLES, False, False),
55+
"custom_metrics_replication_queue": (ALL_ROLES, False, False),
56+
"custom_metrics_server_crash": (ALL_ROLES, False, False),
57+
"custom_metrics_table_sizes": (ALL_ROLES, False, False),
58+
"custom_metrics_test": (ALL_ROLES, False, False),
59+
}

0 commit comments

Comments
 (0)