Replace the single ClickHouse user with a least-privilege user model#6
Open
toffer wants to merge 13 commits into
Open
Replace the single ClickHouse user with a least-privilege user model#6toffer wants to merge 13 commits into
toffer wants to merge 13 commits into
Conversation
Introduce the schema_owner identity that owns the schema and serves as the materialized-view DEFINER, and move the migration Job onto it. - CHI: add schema_owner with config-level grants/query (CREATE DATABASE; full DDL/SELECT/INSERT on otel_traces.*; ALTER on system.*; SYSTEM FLUSH LOGS) and a configurable networksIp. Deliberately no access management. - sql/0014_reparent_mv_definers.sql (new): ALTER ... MODIFY SQL SECURITY DEFINER DEFINER = schema_owner for the three MVs. In-place (no rebuild/gap); existing installs carry an implicit DEFINER=otel that would break once otel is tightened. New file — the create migrations are immutable IF NOT EXISTS (no-op on upgrade). - schema-job: authenticate as schema_owner (was otel). - values: add clickhouse.schemaOwner block (secret, networksIp, externalSecret). - _helpers: factor ExternalSecret rendering into ao-data-platform.externalSecret (external-secret.yaml now ranges over a user list); add the shared ao-data-platform.readerGrants bundle for monte_carlo/readonly_user (later phases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Give the llm-worker its own least-privilege identity instead of sharing
otel.
- CHI: add llm_worker with config grants/query — SELECT on
otel_traces.llm_{batches,inputs,results}, INSERT on
otel_traces.llm_{batches,results}. Append-only, matching the worker's
actual access pattern; no DDL/mutations.
- llm-worker Deployment: authenticate as llm_worker (CH_USER + both
secret refs), and rewrite the init readiness probe from a
system.databases SELECT to `EXISTS TABLE otel_traces.llm_batches`
(privilege-robust under tightened grants; the worker holds a grant on
that table).
- values: add clickhouse.llmWorkerUser block; external-secret: render
its ExternalSecret.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Provision the Monte Carlo identity: the read-only data-source service account AND the AO agent's LLM-queue producer (the agent's credential itself is set in the the consuming application, not here). - CHI: add monte_carlo. grants/query = the shared reader bundle (ao-data-platform.readerGrants: SELECT on otel_traces.*, system.tables/parts/ query_log, information_schema.*) plus INSERT on llm_inputs/llm_batches/ conversation_eval_scores. Deliberately NO readonly profile — readonly=2 blocks INSERT, so grants alone enforce the read-mostly shape. - values: add clickhouse.monteCarloUser block; external-secret: render its ExternalSecret. The reader bundle is shared with readonly_user (Phase 6) via the helper so the metadata-read set has a single source of truth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reduce the always-running collector identity to least privilege, gated so it can ship dark (an earlier rollout step) and be flipped on later (a later rollout step). - CHI: when clickhouse.otel.restrictGrants=true, otel gets grants/query = INSERT on otel_traces.otel_traces (+ otel_metrics) only. Default false leaves otel with implicit full access (backward-compatible). The MV cascade runs as schema_owner (the MV DEFINER), so otel needs nothing on the normalized targets. - Collector init probe: SELECT-on-system.databases -> EXISTS TABLE otel_traces.otel_traces (privilege-robust; otel holds a grant on that table). - values: add clickhouse.otel.restrictGrants (default false). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make readonly_user an explicit, self-sufficient reader and stop the verify script depending on the removed default user. - CHI: readonly_user/grants/query = the shared reader bundle (ao-data-platform. readerGrants — same as monte_carlo's reads). Required because going to explicit config grants removes the implicit all-DB read a config user otherwise has, which DataGrip/MCP introspection relies on. readonly=2 profile kept (governs SET, not table access). networks/ip now configurable via readonlyUser.networksIp. - hack/verify-deployment.sh: route the 9 in-pod read queries through an explicit reader credential (prefer readonly_user, fall back to the always-present monte_carlo) instead of the bare/default user; keep the 2 otel auth checks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an optional superuser escape hatch for DBA / ops tasks the operator never performs (SYSTEM, BACKUP, ad-hoc DDL) and which no other user can do now that the default superuser is removed. - CHI (gated on adminUser.enabled, default off): admin with access_management=1 and GRANT ALL ON *.* WITH GRANT OPTION; configurable networks/ip (intended trusted networks-only per env). Not for service-to-service. - values: add clickhouse.adminUser block (enabled=false). - external-secret: render admin's ExternalSecret only when enabled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- README: add a "ClickHouse user model" section (per-user privilege table, the shared reader bundle, and the readonly_user/admin gates); extend the config table with the new per-user knobs (schemaOwner/llmWorkerUser/monteCarloUser/ adminUser + otel.restrictGrants + readonlyUser.networksIp); update the AWS-prereq secrets line for the additional users. - values: correct the stale otelNetworksIp comment — the the deploying environment does not override it with a VPC CIDR; reachability is enforced at the load balancer and per-caller CH-level scoping is a separate change. Per-user networksIp knobs (default open) were added with each user in the prior phases. Cross-repo rollout handoff is tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Functional verification on k3d surfaced a ClickHouse startup crash (CANNOT_LOAD_CONFIG): "Any other access control settings can't be specified with `grants`" while parsing user 'admin'. ClickHouse rejects combining the legacy <access_management> setting with config-level <grants>. GRANT ALL ON *.* WITH GRANT OPTION already includes the ACCESS MANAGEMENT privilege group (verified live: admin can CREATE USER / GRANT / run SYSTEM), so the setting was redundant. Removed it. (profile + grants remains valid — that's why readonly_user is unaffected.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chart always provisions otel, schema_owner, llm_worker, and monte_carlo, and each requires its own ExternalSecret (secretStoreRef.name and remoteRef.key). The getting-started walkthroughs only wired otel, so following them verbatim hit a required-value failure at template time. - Local Development: seed schema_owner/llm_worker/monte_carlo keys in the Fake ClusterSecretStore and set their externalSecret.* in the install command. - Deploying to AWS EKS: show an ExternalSecret for each always-provisioned user; note readonly_user/admin are off by default and wired the same way. - Note that verify-deployment.sh runs its ClickHouse data checks as readonly_user, so it needs readonlyUser.enabled=true. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a post-deploy check that each ClickHouse user authenticates and holds the expected grants, and that the security-critical denials are enforced: - schema_owner holds DDL; llm_worker reads/writes only the queue and is denied telemetry reads; monte_carlo reads telemetry and produces to the queue but cannot write telemetry; readonly_user is SELECT-only. - Reports whether otel has been tightened to INSERT-only — its grant shape legitimately varies before/after the cutover, so it is not a hard assertion. - Confirms the materialized views run under schema_owner as DEFINER. - admin is checked only when it is enabled. Queries run as each user over the pod's local port via kubectl exec, so the checks hold while each user's networks/ip include loopback (the default); they must move to the Service endpoint once per-caller network scoping lands. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bump to 2.0.0 Code-review fixes for the ClickHouse least-privilege user model (PR #6). verify-deployment.sh (the user-model verification section): - Make denial assertions robust: case-insensitive ACCESS_DENIED/ not-enough-priv grep (F1), tighten readonly_user needle to "GRANT INSERT" + add a runtime write denial (F2), convert monte_carlo telemetry-write denial to a runtime INSERT attempt (F3), add schema_owner no-access-management denials (F4). - Assert otel restriction post-cutover via VERIFY_OTEL_RESTRICTED instead of only warning (F5); quoting-tolerant MV DEFINER grep (F6). - Pass passwords via env CLICKHOUSE_PASSWORD, not --password CLI args (F10). - Full admin needle incl. WITH GRANT OPTION (F13); per-secret missing-secret diagnostics (F14); extra grant/denial coverage for the queue users (F17). Chart: - Tighten schema_owner: enumerate ALTER on the 7 system-log tables the schema Job TTLs instead of ALTER ON system.*; comment why CREATE DATABASE stays global (F7). - Rename the otel user's values keys to the cohesive nested clickhouse.otel.* shape, matching every other user — breaking, no aliases (F11). - Document the readerGrants nindent contract and externalSecret required-message values paths (F15); cross-reference comments + aligned user ordering across the three user-defining files (F16); move the --- separator into the externalSecret helper (F18). Docs: - Fix the user-model table (schema_owner reads, state-dependent otel reads, otel_metrics runtime-created footnote) and add a 1.x -> 2.0.0 upgrade guide (F9, F12). Versioning: - Bump Chart.yaml to 2.0.0 — this is a breaking change to the values contract (F19). Verified: helm lint clean, full helm template renders (6 ExternalSecrets, valid multi-doc YAML), bash -n parses. F8 (EXISTS TABLE probe) dropped — confirmed working in k3d. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
Code Review — 19 findings (18 fixed, 1 skipped) · validated in k3dReviewed: 2026-06-27 UTC | Reviewers: security, devdocs, architecture, correctness, testing, coordinator | Scope: full PR
👉 Fix commits: 1afe264 · bc398a1 (k3d-found F13 correction) k3d validation (2.0.0 chart, full 1.x→2.0.0 upgrade path): |
k3d testing of the new user-model verification revealed the F13 fix false-FAILed: the contiguous needle "GRANT ALL ON *.* WITH GRANT OPTION" never matches because ClickHouse renders SHOW GRANTS as `GRANT ALL ON *.* TO admin WITH GRANT OPTION` (the grantee is interpolated mid-string) and expect_grant is fixed-string (grep -qiF). Split into two substring checks — "GRANT ALL ON *.*" and "WITH GRANT OPTION" — which together verify both the grant-all and the delegation capability. Validated end-to-end in k3d against the 2.0.0 chart (1.x->2.0.0 upgrade path): all user-model checks pass pre-cutover and post-cutover (restrictGrants=true, VERIFY_OTEL_RESTRICTED=true), MV DEFINER reparent confirmed, default user removed, and EXISTS TABLE init probes work for both llm_worker and the INSERT-only otel collector (confirming F8 is a non-issue). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drops the inconsistent decorative `...User` suffix so every per-user values block key matches its actual ClickHouse SQL username — self-documenting and removes the naming judgment call (schemaOwner and monteCarloUser were the same kind of thing named two different ways). clickhouse.llmWorkerUser.* -> clickhouse.llmWorker.* clickhouse.monteCarloUser.* -> clickhouse.monteCarlo.* clickhouse.adminUser.* -> clickhouse.admin.* otel / schemaOwner / readonlyUser are unchanged — each already matches its SQL username (otel / schema_owner / readonly_user). Keying off the username is the invariant; renaming the readonly_user SQL user to drop the asymmetry would touch grants + the readonly profile for no real gain. Value-key paths only — SQL usernames and K8s Secret names are unchanged, so verify-deployment.sh is unaffected. Updated all chart templates, the README config table + walkthrough, and the 2.0.0 upgrade rename table. Re-validated in k3d: clean deploy + all user-model checks pass (0 failures). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Replaces the single, broadly-privileged
otelClickHouse user with a least-privilege model — one user per access path — enforced declaratively via ClickHouse config-level grants in theClickHouseInstallation. The stockdefaultsuperuser stays removed, and each component authenticates as a user scoped to exactly what it does.Users
schema_ownerSYSTEM FLUSH LOGS) and is the materialized-viewDEFINER.otelINSERT-only on the source tables viaclickhouse.otel.restrictGrants.llm_workermonte_carloreadonly_userreadonly=2) for human / MCP / JDBC clients (e.g. DataGrip).adminA shared "reader bundle" —
SELECTon the telemetry DB plus thesystem/information_schemametadata reads JDBC/MCP clients need — is defined once and shared bymonte_carloandreadonly_user.How it's enforced and verified
<grants>in the CHI (declarative, applied by the operator) — no SQL-RBAC bootstrap user.DEFINER = schema_owner(newsql/0014migration reparents existing installs in place), so the ingest user needs nothing on the normalized tables and the normalization cascade keeps working onceotelis restricted.EXISTS TABLEon a table the pod's user is granted (privilege-robust, no reliance onsystem-table row filtering).verify-deployment.shnow asserts the whole model on every deploy: each user's grants, the security-critical denials, and that the MVs run asschema_owner.Key decisions
<grants>, not SQL RBAC — declarative and in-chart. A throwaway verification spike confirmed config grants actually restrict config-defined users, and thataccess_managementcannot be combined with<grants>— soadminusesGRANT ALL … WITH GRANT OPTION, which already carries the access-management privilege group.DEFINER— on this ClickHouse version an MV defaults toDEFINER = its creator, so pre-existing MVs are implicitly owned byotel; reparenting them is required (not just hardening) beforeotelcan be locked down.otellock-down is flag-gated (restrictGrants, default off) so it ships inert and is enabled only after external readers have moved tomonte_carlo.admindefaults to loopback-only — the superuser is never on a network path; using it requires pod access.SELECT/INSERT.Notes
schema_owner/llm_worker/monte_carloare always provisioned, and the README install walkthroughs wire all of them.monte_carloonce that cutover happens.🤖 Generated with Claude Code
/code-review