Replace the single ClickHouse user with a least-privilege user model by toffer · Pull Request #6 · monte-carlo-data/helm-ao-data-platform

toffer · 2026-06-26T22:48:56Z

What this does

Replaces the single, broadly-privileged otel ClickHouse user with a least-privilege model — one user per access path — enforced declaratively via ClickHouse config-level grants in the ClickHouseInstallation. The stock default superuser stays removed, and each component authenticates as a user scoped to exactly what it does.

Users

User	Role
`schema_owner`	Runs the schema-migration Job (DDL, telemetry/system-log TTLs, `SYSTEM FLUSH LOGS`) and is the materialized-view `DEFINER`.
`otel`	The OTel collector's ingest identity. Can be tightened to `INSERT`-only on the source tables via `clickhouse.otel.restrictGrants`.
`llm_worker`	The llm-worker Deployment — reads and appends the LLM job queue only.
`monte_carlo`	Monte Carlo's read-only data-source identity and the agent's queue producer: reads everything, writes only the queue + eval scores, never telemetry.
`readonly_user`	Optional SELECT-only user (`readonly=2`) for human / MCP / JDBC clients (e.g. DataGrip).
`admin`	Optional, gated break-glass superuser — reachable only over loopback (pod-exec) by default.

A shared "reader bundle" — SELECT on the telemetry DB plus the system / information_schema metadata reads JDBC/MCP clients need — is defined once and shared by monte_carlo and readonly_user.

How it's enforced and verified

Privileges come from config-level <grants> in the CHI (declarative, applied by the operator) — no SQL-RBAC bootstrap user.
The materialized views are pinned to DEFINER = schema_owner (new sql/0014 migration reparents existing installs in place), so the ingest user needs nothing on the normalized tables and the normalization cascade keeps working once otel is restricted.
Init/readiness probes use EXISTS TABLE on a table the pod's user is granted (privilege-robust, no reliance on system-table row filtering).
verify-deployment.sh now asserts the whole model on every deploy: each user's grants, the security-critical denials, and that the MVs run as schema_owner.

Key decisions

Config <grants>, not SQL RBAC — declarative and in-chart. A throwaway verification spike confirmed config grants actually restrict config-defined users, and that access_management cannot be combined with <grants> — so admin uses GRANT ALL … WITH GRANT OPTION, which already carries the access-management privilege group.
MVs pinned to a stable DEFINER — on this ClickHouse version an MV defaults to DEFINER = its creator, so pre-existing MVs are implicitly owned by otel; reparenting them is required (not just hardening) before otel can be locked down.
otel lock-down is flag-gated (restrictGrants, default off) so it ships inert and is enabled only after external readers have moved to monte_carlo.
admin defaults to loopback-only — the superuser is never on a network path; using it requires pod access.
Append-only model confirmed — no caller mutates rows, so only the schema Job needs anything beyond SELECT/INSERT.

Notes

Each password-backed user has its own ExternalSecret; schema_owner / llm_worker / monte_carlo are always provisioned, and the README install walkthroughs wire all of them.
Public platform docs need a matching update (tracked separately), notably switching the documented agent connection to monte_carlo once that cutover happens.

🤖 Generated with Claude Code

Ran /code-review

Introduce the schema_owner identity that owns the schema and serves as the materialized-view DEFINER, and move the migration Job onto it. - CHI: add schema_owner with config-level grants/query (CREATE DATABASE; full DDL/SELECT/INSERT on otel_traces.*; ALTER on system.*; SYSTEM FLUSH LOGS) and a configurable networksIp. Deliberately no access management. - sql/0014_reparent_mv_definers.sql (new): ALTER ... MODIFY SQL SECURITY DEFINER DEFINER = schema_owner for the three MVs. In-place (no rebuild/gap); existing installs carry an implicit DEFINER=otel that would break once otel is tightened. New file — the create migrations are immutable IF NOT EXISTS (no-op on upgrade). - schema-job: authenticate as schema_owner (was otel). - values: add clickhouse.schemaOwner block (secret, networksIp, externalSecret). - _helpers: factor ExternalSecret rendering into ao-data-platform.externalSecret (external-secret.yaml now ranges over a user list); add the shared ao-data-platform.readerGrants bundle for monte_carlo/readonly_user (later phases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Give the llm-worker its own least-privilege identity instead of sharing otel. - CHI: add llm_worker with config grants/query — SELECT on otel_traces.llm_{batches,inputs,results}, INSERT on otel_traces.llm_{batches,results}. Append-only, matching the worker's actual access pattern; no DDL/mutations. - llm-worker Deployment: authenticate as llm_worker (CH_USER + both secret refs), and rewrite the init readiness probe from a system.databases SELECT to `EXISTS TABLE otel_traces.llm_batches` (privilege-robust under tightened grants; the worker holds a grant on that table). - values: add clickhouse.llmWorkerUser block; external-secret: render its ExternalSecret. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provision the Monte Carlo identity: the read-only data-source service account AND the AO agent's LLM-queue producer (the agent's credential itself is set in the the consuming application, not here). - CHI: add monte_carlo. grants/query = the shared reader bundle (ao-data-platform.readerGrants: SELECT on otel_traces.*, system.tables/parts/ query_log, information_schema.*) plus INSERT on llm_inputs/llm_batches/ conversation_eval_scores. Deliberately NO readonly profile — readonly=2 blocks INSERT, so grants alone enforce the read-mostly shape. - values: add clickhouse.monteCarloUser block; external-secret: render its ExternalSecret. The reader bundle is shared with readonly_user (Phase 6) via the helper so the metadata-read set has a single source of truth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reduce the always-running collector identity to least privilege, gated so it can ship dark (an earlier rollout step) and be flipped on later (a later rollout step). - CHI: when clickhouse.otel.restrictGrants=true, otel gets grants/query = INSERT on otel_traces.otel_traces (+ otel_metrics) only. Default false leaves otel with implicit full access (backward-compatible). The MV cascade runs as schema_owner (the MV DEFINER), so otel needs nothing on the normalized targets. - Collector init probe: SELECT-on-system.databases -> EXISTS TABLE otel_traces.otel_traces (privilege-robust; otel holds a grant on that table). - values: add clickhouse.otel.restrictGrants (default false). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Make readonly_user an explicit, self-sufficient reader and stop the verify script depending on the removed default user. - CHI: readonly_user/grants/query = the shared reader bundle (ao-data-platform. readerGrants — same as monte_carlo's reads). Required because going to explicit config grants removes the implicit all-DB read a config user otherwise has, which DataGrip/MCP introspection relies on. readonly=2 profile kept (governs SET, not table access). networks/ip now configurable via readonlyUser.networksIp. - hack/verify-deployment.sh: route the 9 in-pod read queries through an explicit reader credential (prefer readonly_user, fall back to the always-present monte_carlo) instead of the bare/default user; keep the 2 otel auth checks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add an optional superuser escape hatch for DBA / ops tasks the operator never performs (SYSTEM, BACKUP, ad-hoc DDL) and which no other user can do now that the default superuser is removed. - CHI (gated on adminUser.enabled, default off): admin with access_management=1 and GRANT ALL ON *.* WITH GRANT OPTION; configurable networks/ip (intended trusted networks-only per env). Not for service-to-service. - values: add clickhouse.adminUser block (enabled=false). - external-secret: render admin's ExternalSecret only when enabled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- README: add a "ClickHouse user model" section (per-user privilege table, the shared reader bundle, and the readonly_user/admin gates); extend the config table with the new per-user knobs (schemaOwner/llmWorkerUser/monteCarloUser/ adminUser + otel.restrictGrants + readonlyUser.networksIp); update the AWS-prereq secrets line for the additional users. - values: correct the stale otelNetworksIp comment — the the deploying environment does not override it with a VPC CIDR; reachability is enforced at the load balancer and per-caller CH-level scoping is a separate change. Per-user networksIp knobs (default open) were added with each user in the prior phases. Cross-repo rollout handoff is tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Functional verification on k3d surfaced a ClickHouse startup crash (CANNOT_LOAD_CONFIG): "Any other access control settings can't be specified with `grants`" while parsing user 'admin'. ClickHouse rejects combining the legacy <access_management> setting with config-level <grants>. GRANT ALL ON *.* WITH GRANT OPTION already includes the ACCESS MANAGEMENT privilege group (verified live: admin can CREATE USER / GRANT / run SYSTEM), so the setting was redundant. Removed it. (profile + grants remains valid — that's why readonly_user is unaffected.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The chart always provisions otel, schema_owner, llm_worker, and monte_carlo, and each requires its own ExternalSecret (secretStoreRef.name and remoteRef.key). The getting-started walkthroughs only wired otel, so following them verbatim hit a required-value failure at template time. - Local Development: seed schema_owner/llm_worker/monte_carlo keys in the Fake ClusterSecretStore and set their externalSecret.* in the install command. - Deploying to AWS EKS: show an ExternalSecret for each always-provisioned user; note readonly_user/admin are off by default and wired the same way. - Note that verify-deployment.sh runs its ClickHouse data checks as readonly_user, so it needs readonlyUser.enabled=true. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a post-deploy check that each ClickHouse user authenticates and holds the expected grants, and that the security-critical denials are enforced: - schema_owner holds DDL; llm_worker reads/writes only the queue and is denied telemetry reads; monte_carlo reads telemetry and produces to the queue but cannot write telemetry; readonly_user is SELECT-only. - Reports whether otel has been tightened to INSERT-only — its grant shape legitimately varies before/after the cutover, so it is not a hard assertion. - Confirms the materialized views run under schema_owner as DEFINER. - admin is checked only when it is enabled. Queries run as each user over the pod's local port via kubectl exec, so the checks hold while each user's networks/ip include loopback (the default); they must move to the Service endpoint once per-caller network scoping lands. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bump to 2.0.0 Code-review fixes for the ClickHouse least-privilege user model (PR #6). verify-deployment.sh (the user-model verification section): - Make denial assertions robust: case-insensitive ACCESS_DENIED/ not-enough-priv grep (F1), tighten readonly_user needle to "GRANT INSERT" + add a runtime write denial (F2), convert monte_carlo telemetry-write denial to a runtime INSERT attempt (F3), add schema_owner no-access-management denials (F4). - Assert otel restriction post-cutover via VERIFY_OTEL_RESTRICTED instead of only warning (F5); quoting-tolerant MV DEFINER grep (F6). - Pass passwords via env CLICKHOUSE_PASSWORD, not --password CLI args (F10). - Full admin needle incl. WITH GRANT OPTION (F13); per-secret missing-secret diagnostics (F14); extra grant/denial coverage for the queue users (F17). Chart: - Tighten schema_owner: enumerate ALTER on the 7 system-log tables the schema Job TTLs instead of ALTER ON system.*; comment why CREATE DATABASE stays global (F7). - Rename the otel user's values keys to the cohesive nested clickhouse.otel.* shape, matching every other user — breaking, no aliases (F11). - Document the readerGrants nindent contract and externalSecret required-message values paths (F15); cross-reference comments + aligned user ordering across the three user-defining files (F16); move the --- separator into the externalSecret helper (F18). Docs: - Fix the user-model table (schema_owner reads, state-dependent otel reads, otel_metrics runtime-created footnote) and add a 1.x -> 2.0.0 upgrade guide (F9, F12). Versioning: - Bump Chart.yaml to 2.0.0 — this is a breaking change to the values contract (F19). Verified: helm lint clean, full helm template renders (6 ExternalSecrets, valid multi-doc YAML), bash -n parses. F8 (EXISTS TABLE probe) dropped — confirmed working in k3d. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

toffer · 2026-06-27T15:52:12Z

Code Review — 19 findings (18 fixed, 1 skipped) · validated in k3d

Reviewed: 2026-06-27 UTC | Reviewers: security, devdocs, architecture, correctness, testing, coordinator | Scope: full PR

#	Severity	Finding	Status
F1	ISSUE	`expect_denied` case-sensitive `ACCESS_DENIED` grep — fragile across CH versions	✅ Case-insensitive, broadened pattern
F2	ISSUE	`readonly_user` forbid needle `"INSERT"` too loose; no runtime check	✅ Tightened to `GRANT INSERT` + runtime write denial
F3	ISSUE	`monte_carlo` telemetry-write denial was grant-text only	✅ Now a runtime `INSERT` attempt
F4	ISSUE	`schema_owner` "no access management" property unverified	✅ Added access-mgmt + CREATE USER denials
F5	ISSUE	otel restrict state reported, never asserted	✅ Asserts via `VERIFY_OTEL_RESTRICTED`
F6	ISSUE	MV DEFINER grep may miss CH identifier quoting	✅ Quoting/case-tolerant grep (hedge)
F7	ISSUE	`schema_owner` `ALTER ON system.*` + global `CREATE DATABASE` over-broad	✅ Enumerated 7 system-log tables; documented why CREATE DATABASE stays global
F8	ISSUE	`EXISTS TABLE` probe may need `system.tables` SELECT	⏭️ Skipped — confirmed non-issue in k3d (works for llm_worker AND INSERT-only otel)
F9	ISSUE	README user-model table misstated schema_owner/otel reads + `otel_metrics`	✅ Table corrected + runtime-created footnote
F10	SUGGESTION	Passwords passed as `--password` CLI args (process list/audit leak)	✅ Via `env CLICKHOUSE_PASSWORD`
F11	SUGGESTION	otel values shape inconsistent with other users	✅ Renamed to nested `clickhouse.otel.*` (breaking, no aliases)
F12	SUGGESTION	No upgrade guide for existing installs	✅ Added 1.x→2.0.0 upgrade section
F13	SUGGESTION	admin needle misses `WITH GRANT OPTION`	✅ Fixed — split into two substring checks (first attempt false-FAILed in k3d: `SHOW GRANTS` interpolates `TO admin`)
F14	SUGGESTION	Undifferentiated missing-secret message	✅ Names the specific missing user
F15	SUGGESTION	Helper `nindent` contract / `required` paths undocumented	✅ Documented
F16	SUGGESTION	User defs split across 3 files, no cross-ref	✅ Cross-ref comments + aligned ordering
F17	SUGGESTION	Queue-user grant/denial coverage gaps	✅ Added checks
F18	NIT	`---` separator placement in range loop	✅ Moved into the ExternalSecret helper
F19	ISSUE	`Chart.yaml` not bumped for a breaking change	✅ Bumped to `2.0.0`

👉 Fix commits: 1afe264 · bc398a1 (k3d-found F13 correction)

k3d validation (2.0.0 chart, full 1.x→2.0.0 upgrade path): helm lint clean; ESO synced all 6 per-user secrets; default superuser removed; 0014 MV-DEFINER reparent confirmed (all 3 MVs DEFINER = schema_owner); all user-model assertions pass pre-cutover and post-cutover (restrictGrants=true + VERIFY_OTEL_RESTRICTED=true); llm-worker + INSERT-only otel collector init EXISTS TABLE probes both pass.

k3d testing of the new user-model verification revealed the F13 fix false-FAILed: the contiguous needle "GRANT ALL ON *.* WITH GRANT OPTION" never matches because ClickHouse renders SHOW GRANTS as `GRANT ALL ON *.* TO admin WITH GRANT OPTION` (the grantee is interpolated mid-string) and expect_grant is fixed-string (grep -qiF). Split into two substring checks — "GRANT ALL ON *.*" and "WITH GRANT OPTION" — which together verify both the grant-all and the delegation capability. Validated end-to-end in k3d against the 2.0.0 chart (1.x->2.0.0 upgrade path): all user-model checks pass pre-cutover and post-cutover (restrictGrants=true, VERIFY_OTEL_RESTRICTED=true), MV DEFINER reparent confirmed, default user removed, and EXISTS TABLE init probes work for both llm_worker and the INSERT-only otel collector (confirming F8 is a non-issue). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drops the inconsistent decorative `...User` suffix so every per-user values block key matches its actual ClickHouse SQL username — self-documenting and removes the naming judgment call (schemaOwner and monteCarloUser were the same kind of thing named two different ways). clickhouse.llmWorkerUser.* -> clickhouse.llmWorker.* clickhouse.monteCarloUser.* -> clickhouse.monteCarlo.* clickhouse.adminUser.* -> clickhouse.admin.* otel / schemaOwner / readonlyUser are unchanged — each already matches its SQL username (otel / schema_owner / readonly_user). Keying off the username is the invariant; renaming the readonly_user SQL user to drop the asymmetry would touch grants + the readonly profile for no real gain. Value-key paths only — SQL usernames and K8s Secret names are unchanged, so verify-deployment.sh is unaffected. Updated all chart templates, the README config table + walkthrough, and the 2.0.0 upgrade rename table. Re-validated in k3d: clean deploy + all user-model checks pass (0 failures). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

toffer and others added 10 commits June 26, 2026 12:59

toffer requested a review from a team as a code owner June 26, 2026 22:48

toffer and others added 2 commits June 27, 2026 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace the single ClickHouse user with a least-privilege user model#6

Replace the single ClickHouse user with a least-privilege user model#6
toffer wants to merge 13 commits into
mainfrom
toffermann/clickhouse-least-privilege-user-model

toffer commented Jun 26, 2026 •

edited

Loading

Uh oh!

toffer commented Jun 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

toffer commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Users

How it's enforced and verified

Key decisions

Notes

Uh oh!

toffer commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

toffer commented Jun 26, 2026 •

edited

Loading

toffer commented Jun 27, 2026 •

edited

Loading