Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ClickHouse anomaly-detection migration, new AI_ANALYST ClickHouse credentials across envs/docker/setup, startup role-sync call, test utilities and end-to-end anomaly detection tests, and demo-data seeding for anomaly patterns. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application Startup
participant Bootstrap as ClickHouse Bootstrap
participant Migr as Migration Runner
participant CH as ClickHouse DB
participant Tests as Vitest Runner
App->>Bootstrap: register()/startup()
Bootstrap->>CH: provisionInitialUser()
CH-->>Bootstrap: user created
Bootstrap->>Bootstrap: read env (AI_ANALYST_CLICKHOUSE_USER, AI_ANALYST_CLICKHOUSE_PASSWORD)
Bootstrap->>CH: syncDatabaseRoles() -> ALTER USER `ai_analyst_user` IDENTIFIED WITH sha256_password
CH-->>Bootstrap: credentials updated
Migr->>CH: run 007_anomaly_detection.sql (tables, views, role, user, grants)
CH-->>Migr: schema and roles created
Tests->>CH: seedAnomalyTestPattern() / inserts
CH-->>Tests: data inserted
Tests->>CH: query v_cwv_anomalies / processed_anomalies
CH-->>Tests: anomaly results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
♻️ Duplicate comments (2)
apps/monitor-app/.env.test (1)
10-11: Same key-ordering concern as.env.example.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/.env.test` around lines 10 - 11, The two environment keys AI_ANALYST_CLICKHOUSE_USER and AI_ANALYST_CLICKHOUSE_PASSWORD are ordered differently than in .env.example; reorder these entries so their sequence matches the canonical ordering in .env.example (place the username and password keys in the same order as the example) to eliminate the duplicate key-ordering discrepancy.apps/monitor-app/.env.ci (1)
10-11: Same key-ordering concern as.env.example.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/.env.ci` around lines 10 - 11, The two ClickHouse env vars in .env.ci (AI_ANALYST_CLICKHOUSE_USER and AI_ANALYST_CLICKHOUSE_PASSWORD) are not in the same key order as .env.example; update .env.ci so the keys appear in the same order and grouping as .env.example (matching AI_ANALYST_CLICKHOUSE_* placement) to avoid duplicate-ordering drift and make diffs consistent.
🧹 Nitpick comments (4)
apps/monitor-app/.env.example (1)
10-11: Consider alphabetical key ordering per dotenv-linter conventions.
dotenv-linterexpects keys sorted alphabetically:AI_ANALYST_CLICKHOUSE_PASSWORDbeforeAI_ANALYST_CLICKHOUSE_USER, and both beforeCLICKHOUSE_DB. The same pattern appears in.env.testand.env.ci— fixing all three together would silence the warnings consistently.♻️ Proposed ordering fix
+AI_ANALYST_CLICKHOUSE_PASSWORD=ai_analyst_password +AI_ANALYST_CLICKHOUSE_USER=ai_analyst_user CLICKHOUSE_HOST=clickhouse CLICKHOUSE_PORT=8123 CLICKHOUSE_DB=cwv_monitor CLICKHOUSE_USER=default CLICKHOUSE_PASSWORD=secret -AI_ANALYST_CLICKHOUSE_USER=ai_analyst_user -AI_ANALYST_CLICKHOUSE_PASSWORD=ai_analyst_password🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/.env.example` around lines 10 - 11, Reorder the environment keys to satisfy dotenv-linter alphabetical ordering: place AI_ANALYST_CLICKHOUSE_PASSWORD before AI_ANALYST_CLICKHOUSE_USER and ensure both appear before CLICKHOUSE_DB in apps/monitor-app/.env.example; apply the same alphabetical reorder to the matching entries in apps/monitor-app/.env.test and apps/monitor-app/.env.ci so the three files stay consistent and lint warnings are resolved.apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql (2)
48-48: Variance formula is susceptible to catastrophic cancellation for large metric values
E[x²] - (E[x])²is numerically unstable when the two terms are close in magnitude (common for LCP/TTFB in milliseconds with tight distributions). Whilemax2(..., 0)guards against a negative result, the computed stddev can still be artificially near-zero due to precision loss, causing z-scores to spike falsely. ClickHouse'svarSamp/varPopaggregate functions use a numerically stable one-pass algorithm and would be more robust — though switching requires storing the variance state instead of sum-of-squares, which would be a schema change tocwv_stats_hourly.As a lower-cost improvement, document the known limitation inline so future maintainers understand why occasional false positives may occur on high-magnitude metrics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql` at line 48, The computed b_stddev expression (sqrt(max2((sumMerge(sum_squares) / countMerge(count_value)) - pow(b_avg, 2), 0)) AS b_stddev) uses E[x²] - (E[x])² which is numerically unstable for large metric values; add an inline SQL comment next to this expression explaining the limitation and why it may produce near-zero stddevs and false z-score spikes, and reference that a more robust fix would be to use ClickHouse's varSamp/varPop (which require changing the cwv_stats_hourly schema to store variance state) so future maintainers understand trade-offs and potential false positives.
12-14:ORDER BYkey order differs from theGROUP BYorder used in queriesThe table's sort key is
(project_id, hour, metric_name, route, device_type), but both CTEs inv_cwv_anomaliesgroup by(project_id, route, device_type, metric_name, hour). ForAggregatingMergeTree, background merges are most efficient when theORDER BYcolumns appear in the same sequence as the grouping columns used during reads. Reordering the key to match query access patterns would allow ClickHouse to merge parts more effectively.♻️ Suggested fix
-ORDER BY (project_id, hour, metric_name, route, device_type) +ORDER BY (project_id, route, device_type, metric_name, hour)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql` around lines 12 - 14, The table's AggregatingMergeTree ORDER BY tuple (project_id, hour, metric_name, route, device_type) does not match the GROUP BY ordering used in v_cwv_anomalies CTEs (project_id, route, device_type, metric_name, hour), which hurts merge efficiency; update the table definition's ORDER BY to the same sequence used by the queries (e.g., ORDER BY (project_id, route, device_type, metric_name, hour)) so background merges and reads align (keep PARTITION BY toYYYYMM(hour) unchanged). Reference the AggregatingMergeTree table ORDER BY clause in the migration and the v_cwv_anomalies CTEs' GROUP BY to ensure exact column order alignment.apps/monitor-app/scripts/seed-demo-data.mjs (1)
587-632:seedAnomalyTestPatternshares the module-levelrng— results are non-deterministic in isolationThe function relies on the module-level
rnginstance whose state is already advanced by module initialisation and any priorbuildEventscalls. If this function is invoked standalone (as tests do), the PRNG position is unpredictable, which could causemetric_valuesamples to drift outside the intendedgood/poorrating bands and silently produce incorrect test fixtures.Consider accepting an optional
seedparameter and constructing a local RNG, or seeding it from a fixed constant before the loops.♻️ Suggested approach
-export async function seedAnomalyTestPattern(client, projectId) { +export async function seedAnomalyTestPattern(client, projectId, seed = 9999) { + const localRng = createRng(seed); const now = new Date(); const currentHourMark = new Date(now.setMinutes(0, 0, 0)); ... - metric_value: 2000 + (rng() * 300), + metric_value: 2000 + (localRng() * 300), ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/scripts/seed-demo-data.mjs` around lines 587 - 632, seedAnomalyTestPattern currently uses the module-level rng so outputs vary depending on prior RNG use; make it deterministic by adding an optional seed parameter (e.g., seed) and creating a local PRNG inside seedAnomalyTestPattern (or reseeding a local RNG from that seed) to use for metric_value generation instead of the module-level rng; update references to rng inside the function to use the new local RNG and default the seed to a fixed constant when none is provided so tests can call it in isolation and get reproducible metric_value bands.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql`:
- Around line 100-103: The migration currently creates ai_analyst_user with a
hardcoded password 'ai_analyst_password'; instead modify the CREATE
USER/IDENTIFIED clause to avoid committing credentials by creating
ai_analyst_user WITHOUT a password (e.g., CREATE USER ... NO PASSWORD) or use a
non-committed strong random placeholder, then keep the GRANT r_ai_analyst and
ALTER USER ai_analyst_user DEFAULT ROLE r_ai_analyst as-is; ensure
syncDatabaseRoles() is relied on to set/rotate the real password at startup and
add a brief comment in the migration referencing syncDatabaseRoles() so
reviewers know the credential is managed outside source control.
- Around line 35-43: The current_hour CTE is including the previous completed
hour because its WHERE uses "hour >= current_hour_mark - INTERVAL 1 HOUR";
change the filter to only include the in-progress hour by restricting hour to
the current hour (use current_hour_mark as the lower bound and exclude earlier
hours, e.g., hour >= current_hour_mark AND hour < current_hour_mark + INTERVAL 1
HOUR, or simply hour = current_hour_mark) so avg_val and n are computed solely
from the current running hour in the current_hour CTE.
- Around line 78-89: processed_anomalies is missing the device_type column which
breaks identity parity with v_cwv_anomalies (anomaly_id is derived using
device_type); add a device_type column (e.g., LowCardinality(String)) to the
processed_anomalies table and include it in the table key/order definition used
by ReplacingMergeTree (so ORDER BY and identity grouping include device_type
alongside project_id and anomaly_id) to preserve the device-specific anomaly
identity when inserting and querying anomalies.
In `@apps/monitor-app/scripts/seed-demo-data.mjs`:
- Line 589: The call that constructs currentHourMark uses
Date.prototype.setMinutes with a spurious fourth argument; remove the extra
argument and call setMinutes with three parameters (minutes, seconds,
milliseconds) so the line that assigns currentHourMark (using now.setMinutes)
passes only 0 for minutes, 0 for seconds, 0 for milliseconds and no fourth
value, ensuring the code correctly reflects the Date API for the variable
currentHourMark.
In `@apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts`:
- Around line 8-11: The early-return guard using aiUser and aiPass is
misleadingly indented so the subsequent await sql`ALTER USER
${sql.identifier(aiUser)} IDENTIFIED WITH sha256_password BY
${aiPass}`.command() looks like it’s inside the if; update the code to make
intent unambiguous by either (A) adding braces to the if (if (!aiUser ||
!aiPass) { return; }) so the return is clearly isolated, or (B) outdenting the
await line to the same indentation as the if so it’s clearly outside the guard;
reference aiUser, aiPass and the await sql`ALTER USER ...`.command() call when
making the change.
In `@apps/monitor-app/src/env.ts`:
- Around line 36-37: The username default makes the aiUser guard in
syncDatabaseRoles ineffective; remove the hardcoded default and allow the value
to be absent so the existing guard in bootstrap.ts (the if (!aiUser || !aiPass)
return; check used by syncDatabaseRoles) can work. Specifically, change
AI_ANALYST_CLICKHOUSE_USER in env.ts to not call .default("ai_analyst_user")
(make it optional, e.g. z.string().min(1).optional() or z.string().optional())
so env.AI_ANALYST_CLICKHOUSE_USER can be falsy and the bootstrap.ts guard works
as intended.
In `@apps/monitor-app/src/instrumentation.ts`:
- Around line 5-6: The call to syncDatabaseRoles() during startup can throw if
the ClickHouse user does not exist; either add the IF EXISTS modifier to the
ALTER USER statement inside syncDatabaseRoles() in bootstrap.ts so ALTER USER
becomes ALTER USER IF EXISTS ... (and keep any existing error handling), or wrap
the call site in instrumentation.ts (where provisionInitialUser(); await
syncDatabaseRoles(); is invoked during register()) with a try/catch that logs a
non-fatal warning and continues startup on failure; update references to
syncDatabaseRoles(), provisionInitialUser(), and register() accordingly so the
startup won’t crash when the ClickHouse user is absent.
In `@apps/monitor-app/src/test/anomaly-detection.test.ts`:
- Around line 20-22: The tests mutate process.env.CLICKHOUSE_HOST and
CLICKHOUSE_PORT and never restore them; capture the original values before you
overwrite them (e.g., store origClickhouseHost and origClickhousePort) and
restore them in the test teardown (afterAll or afterEach) by setting process.env
back to the original values or deleting the keys if they were undefined; update
both places where you set process.env (the block that sets
CLICKHOUSE_HOST/CLICKHOUSE_PORT and the other occurrence) to use this
save-and-restore pattern so other tests aren’t affected.
- Around line 109-117: The test inserts a row with route "/gap-test" via
directClient.insert but later asserts stats for '/checkout', so update the
inserted route to match the asserted route (or change the assertions to
'/gap-test') so the test actually validates the inserted record; specifically
edit the directClient.insert call in anomaly-detection.test (the object with
project_id, session_id, route, metric_name, metric_value, recorded_at) to use
route: '/checkout' (and make the same correction for the similar
insert/assertion pair referenced around lines 124-129) so the route in the
inserted value and the route used in the assertions are identical.
In `@apps/monitor-app/src/test/clickhouse-test-utils.ts`:
- Around line 146-149: optimizeAnomalies currently issues OPTIMIZE immediately
and can run before mv_cwv_stats_hourly has ingested rows; modify
optimizeAnomalies to mirror optimizeAggregates by polling (up to ~10s, short
interval) the materialized view mv_cwv_stats_hourly until it reports the
expected row count (or >0) before calling sqlClient`OPTIMIZE TABLE
cwv_stats_hourly FINAL`. Locate the optimizeAnomalies function and add the
readiness loop/timeout check (same pattern as in optimizeAggregates) to ensure
the mv is populated on slow CI runners prior to optimizing.
In `@docker/docker-compose.yml`:
- Around line 7-8: The shared anchor &clickhouse-env currently requires
AI_ANALYST_CLICKHOUSE_USER and AI_ANALYST_CLICKHOUSE_PASSWORD with the :?
operator which prevents services that merge <<: *clickhouse-env (notably the
migrations and seed-demo services) from starting unless those vars exist; to
fix, remove the required-variable entries from the &clickhouse-env anchor and
instead add the two required env entries with the :? operator only to the
monitor-app service's environment block (alternatively make them optional in the
anchor if you prefer minimal change), ensuring migrations and seed-demo no
longer depend on those credentials.
---
Duplicate comments:
In `@apps/monitor-app/.env.ci`:
- Around line 10-11: The two ClickHouse env vars in .env.ci
(AI_ANALYST_CLICKHOUSE_USER and AI_ANALYST_CLICKHOUSE_PASSWORD) are not in the
same key order as .env.example; update .env.ci so the keys appear in the same
order and grouping as .env.example (matching AI_ANALYST_CLICKHOUSE_* placement)
to avoid duplicate-ordering drift and make diffs consistent.
In `@apps/monitor-app/.env.test`:
- Around line 10-11: The two environment keys AI_ANALYST_CLICKHOUSE_USER and
AI_ANALYST_CLICKHOUSE_PASSWORD are ordered differently than in .env.example;
reorder these entries so their sequence matches the canonical ordering in
.env.example (place the username and password keys in the same order as the
example) to eliminate the duplicate key-ordering discrepancy.
---
Nitpick comments:
In `@apps/monitor-app/.env.example`:
- Around line 10-11: Reorder the environment keys to satisfy dotenv-linter
alphabetical ordering: place AI_ANALYST_CLICKHOUSE_PASSWORD before
AI_ANALYST_CLICKHOUSE_USER and ensure both appear before CLICKHOUSE_DB in
apps/monitor-app/.env.example; apply the same alphabetical reorder to the
matching entries in apps/monitor-app/.env.test and apps/monitor-app/.env.ci so
the three files stay consistent and lint warnings are resolved.
In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql`:
- Line 48: The computed b_stddev expression (sqrt(max2((sumMerge(sum_squares) /
countMerge(count_value)) - pow(b_avg, 2), 0)) AS b_stddev) uses E[x²] - (E[x])²
which is numerically unstable for large metric values; add an inline SQL comment
next to this expression explaining the limitation and why it may produce
near-zero stddevs and false z-score spikes, and reference that a more robust fix
would be to use ClickHouse's varSamp/varPop (which require changing the
cwv_stats_hourly schema to store variance state) so future maintainers
understand trade-offs and potential false positives.
- Around line 12-14: The table's AggregatingMergeTree ORDER BY tuple
(project_id, hour, metric_name, route, device_type) does not match the GROUP BY
ordering used in v_cwv_anomalies CTEs (project_id, route, device_type,
metric_name, hour), which hurts merge efficiency; update the table definition's
ORDER BY to the same sequence used by the queries (e.g., ORDER BY (project_id,
route, device_type, metric_name, hour)) so background merges and reads align
(keep PARTITION BY toYYYYMM(hour) unchanged). Reference the AggregatingMergeTree
table ORDER BY clause in the migration and the v_cwv_anomalies CTEs' GROUP BY to
ensure exact column order alignment.
In `@apps/monitor-app/scripts/seed-demo-data.mjs`:
- Around line 587-632: seedAnomalyTestPattern currently uses the module-level
rng so outputs vary depending on prior RNG use; make it deterministic by adding
an optional seed parameter (e.g., seed) and creating a local PRNG inside
seedAnomalyTestPattern (or reseeding a local RNG from that seed) to use for
metric_value generation instead of the module-level rng; update references to
rng inside the function to use the new local RNG and default the seed to a fixed
constant when none is provided so tests can call it in isolation and get
reproducible metric_value bands.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
apps/monitor-app/.env.ciapps/monitor-app/.env.exampleapps/monitor-app/.env.testapps/monitor-app/clickhouse/migrations/007_anomaly_detection.sqlapps/monitor-app/package.jsonapps/monitor-app/scripts/seed-demo-data.mjsapps/monitor-app/src/app/server/lib/clickhouse/bootstrap.tsapps/monitor-app/src/env.tsapps/monitor-app/src/instrumentation.tsapps/monitor-app/src/test/anomaly-detection.test.tsapps/monitor-app/src/test/clickhouse-test-utils.tsapps/monitor-app/src/test/performance-guardrails.test.tsapps/monitor-app/vitest.anomaly.config.tsdocker/docker-compose.dev.ymldocker/docker-compose.ymldocker/monitor-app.prod.Dockerfilesetup.sh
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: check_pull_request
🧰 Additional context used
🧬 Code graph analysis (3)
apps/monitor-app/src/env.ts (1)
apps/monitor-app/scripts/seed-demo-data.mjs (1)
process(6-12)
apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts (2)
apps/monitor-app/src/env.ts (1)
env(27-72)apps/monitor-app/src/app/server/lib/clickhouse/client.ts (1)
sql(44-53)
apps/monitor-app/src/instrumentation.ts (2)
apps/monitor-app/src/lib/provision-initial-user.ts (1)
provisionInitialUser(5-23)apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts (1)
syncDatabaseRoles(4-12)
🪛 dotenv-linter (4.0.0)
apps/monitor-app/.env.ci
[warning] 10-10: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_USER key should go before the CLICKHOUSE_DB key
(UnorderedKey)
[warning] 11-11: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the AI_ANALYST_CLICKHOUSE_USER key
(UnorderedKey)
apps/monitor-app/.env.example
[warning] 10-10: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_USER key should go before the CLICKHOUSE_DB key
(UnorderedKey)
[warning] 11-11: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the AI_ANALYST_CLICKHOUSE_USER key
(UnorderedKey)
apps/monitor-app/.env.test
[warning] 10-10: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_USER key should go before the CLICKHOUSE_DB key
(UnorderedKey)
[warning] 11-11: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the AI_ANALYST_CLICKHOUSE_USER key
(UnorderedKey)
🔇 Additional comments (9)
apps/monitor-app/src/test/performance-guardrails.test.ts (1)
246-248: LGTM — signature update is correct.
customStart: nullandcustomEnd: nullalign the test call with the updatedgetDashboardDatasignature. No other impact.apps/monitor-app/package.json (1)
17-18: LGTM — newtest:anomalyscript looks correct.Consistent with the pattern used by
test:perfand matches the new config/test files introduced in this PR.docker/monitor-app.prod.Dockerfile (1)
40-41: LGTM — build-time placeholders are consistent with the existing pattern.The
analyst-placeholdervalues follow the same convention asbuild-placeholder/build-time-placeholder-will-be-replaced-at-runtimeused for other sensitive vars above, and the existing comment on line 31 makes the intent clear.docker/docker-compose.dev.yml (1)
9-10: LGTM – dev-mode defaults are clearly scoped to this compose file and don't affect production.apps/monitor-app/vitest.anomaly.config.ts (1)
1-12: LGTM – clean extension of the existing integration config; 240 s timeout is appropriate for container-based anomaly tests.apps/monitor-app/src/test/clickhouse-test-utils.ts (1)
168-168: LGTM –CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: "1"is required for the migration'sCREATE ROLE/CREATE USERstatements to succeed inside the test container.setup.sh (1)
212-214: LGTM –AI_ANALYST_CLICKHOUSE_PASSWORDis generated with the samegenerate_secret(openssl rand) helper as the other secrets, and the hardcoded usernameai_analyst_usermatches the identity created in the migration.apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts (1)
9-11: The code is safe. Thesqltagged template is provided bywaddler(v0.1.1), which automatically converts all${...}interpolations into parameterized ClickHouse query values using the{name:Type}syntax. Unlike raw string concatenation, bare${aiPass}is parameterized and sent separately, preventing SQL injection even if the password contains special characters like'. The use ofsql.identifier(aiUser)for the identifier and${aiPass}for the value is the correct pattern.Likely an incorrect or invalid review comment.
apps/monitor-app/src/test/anomaly-detection.test.ts (1)
41-50: Strong deterministic assertion coverage here.This test has clear and stable checks (
z_scorethreshold + deterministicanomaly_idformat) and is a good anchor for the suite.
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql
Outdated
Show resolved
Hide resolved
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql (3)
44-46: Baseline excludes the hour immediately before current — intentional gap worth documenting.
gap_hour = current_hour_mark - INTERVAL 1 HOURcauses the baseline to exclude the most recent completed hour. This is a reasonable choice (avoids partial-hour data right after rollover), but it's a subtle design decision. A brief SQL comment explaining why the 1-hour gap exists would help future maintainers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql` around lines 44 - 46, Add a short SQL comment explaining why gap_hour = current_hour_mark - INTERVAL 1 HOUR is used so the baseline excludes the most recent completed hour (avoids partial-hour/rollover bias); place the comment near the gap_hour definition or immediately above the avgMergeIf expressions (log_avg_curr, log_avg_base, log_stddev_base) referencing gap_hour, current_hour_mark and baseline_start so future maintainers understand the intentional 1-hour exclusion.
54-54: Epsilon guard of0.00001can produce misleadingly extreme z-scores.When
log_stddev_base = 0(all baseline values identical), dividing by0.00001yields z-scores on the order of 10⁵ for even small deviations. If downstream logic uses z-score thresholds (e.g.,|z| > 3), this works but obscures the fact that the baseline had zero variance. Consider using a larger epsilon (e.g.,0.01) or adding aHAVING log_stddev_base > 0filter to only flag anomalies where the baseline had meaningful variance.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql` at line 54, The current z_score expression uses an extremely small epsilon (0.00001) when log_stddev_base is zero, producing misleadingly huge z_score values; update the expression that defines z_score (currently using log_avg_curr, log_avg_base, log_stddev_base) to use a more reasonable epsilon (e.g., 0.01) in the IF guard OR add a HAVING clause (e.g., HAVING log_stddev_base > 0) so rows with zero baseline variance are excluded from anomaly scoring; modify the z_score calculation and/or the query's filter to reference the same symbols (z_score, log_stddev_base, log_avg_curr, log_avg_base) so downstream thresholding isn't skewed by tiny epsilon values.
89-90:no_passwordcreation is handled correctly during startup—but network restrictions remain a valid hardening measure.
syncDatabaseRoles()runs as part of Next.js'sinstrumentation.register()startup hook and immediately sets the password viaALTER USER ... IDENTIFIED WITH sha256_password. SinceAI_ANALYST_CLICKHOUSE_PASSWORDis required (not optional) and this function executes early in the initialization sequence, the account is secured before the application begins handling external requests.That said, network access restrictions via ClickHouse's
<networks>configuration would provide additional defense-in-depth. Consider documenting this as a hardening recommendation for production deployments where theai_analyst_useraccount should only be accessible from trusted internal services.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql` around lines 89 - 90, The migration creates ai_analyst_user with IDENTIFIED WITH no_password, but syncDatabaseRoles() (called from instrumentation.register()) immediately sets a sha256_password using the required env var AI_ANALYST_CLICKHOUSE_PASSWORD; update project docs or deployment notes to recommend adding ClickHouse <networks> restrictions to limit access to ai_analyst_user from trusted internal hosts as a defense-in-depth measure and mention that syncDatabaseRoles() secures the account at startup via ALTER USER ... IDENTIFIED WITH sha256_password.apps/monitor-app/scripts/seed-demo-data.mjs (2)
587-633: Shared module-levelrngmay produce non-deterministic results across callers.
seedAnomalyTestPatternuses the module-levelrng, whose internal state depends on how many prior calls have been made (e.g., bybuildEventsor other seed functions). If this function is called in isolation (e.g., from tests), the RNG sequence will differ from when it's called after a full seed run.For a test-seeding utility where reproducibility matters, consider creating a local RNG instance:
Suggested improvement
export async function seedAnomalyTestPattern(client, projectId) { + const localRng = createRng(12345); const now = new Date(); ... - metric_value: 2000 + (rng() * 300), + metric_value: 2000 + (localRng() * 300),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/scripts/seed-demo-data.mjs` around lines 587 - 633, seedAnomalyTestPattern currently uses the module-level rng which yields non-deterministic sequences depending on prior calls; change seedAnomalyTestPattern to use a local, deterministic RNG by either (a) accepting an optional rng parameter (e.g., seedAnomalyTestPattern(client, projectId, rng)) and use that instead of the module-level rng, or (b) instantiate a new seeded RNG at the start of the function (seeded from a provided seed or from a deterministic value like projectId + currentHourMark) and replace all uses of the module-level rng in this function with that local generator; ensure callers (tests and top-level seed runner) pass a fixed seed or RNG when reproducibility is required and adjust any references to buildEvents if it should share or accept the same seeded RNG.
614-616: Anomaly event timestamps may extend into the future.
currentHourMarkis the top of the current hour, and events are placed atcurrentHourMark + i * 60_000(up to +29 min). If the current wall-clock time is, say, 10 minutes past the hour, about 20 of these 30 events will have timestamps in the future. This is probably fine for test seeding, but worth noting in case the anomaly detection view or downstream logic filters onrecorded_at <= now().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitor-app/scripts/seed-demo-data.mjs` around lines 614 - 616, The loop that sets recordedAt in the for loop (using currentHourMark + i * 60_000) can produce future timestamps; change the timestamp calculation to cap events at now() so no recordedAt is later than the current wall-clock time—compute an eventDate = new Date(Math.min(currentHourMark.getTime() + i * 60_000, Date.now())), then call formatDateTime64Utc(eventDate) when building recordedAt; update the code in the same loop where sessionId and recordedAt are created (the for loop using currentHourMark, recordedAt, and randomUUID()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql`:
- Around line 38-40: The anomaly_id generation uses lower(hex(MD5(concat(...))))
with concat(toString(project_id), route, metric_name, device_type,
toString(current_hour_mark)) which concatenates fields without separators and
can cause hash collisions; fix by inserting a fixed delimiter (e.g., '|' or
'\x00') between each field in the concat call so each field boundary is
preserved (update the concat used inside MD5 and keep the lower(hex(...)) AS
anomaly_id expression unchanged).
---
Nitpick comments:
In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql`:
- Around line 44-46: Add a short SQL comment explaining why gap_hour =
current_hour_mark - INTERVAL 1 HOUR is used so the baseline excludes the most
recent completed hour (avoids partial-hour/rollover bias); place the comment
near the gap_hour definition or immediately above the avgMergeIf expressions
(log_avg_curr, log_avg_base, log_stddev_base) referencing gap_hour,
current_hour_mark and baseline_start so future maintainers understand the
intentional 1-hour exclusion.
- Line 54: The current z_score expression uses an extremely small epsilon
(0.00001) when log_stddev_base is zero, producing misleadingly huge z_score
values; update the expression that defines z_score (currently using
log_avg_curr, log_avg_base, log_stddev_base) to use a more reasonable epsilon
(e.g., 0.01) in the IF guard OR add a HAVING clause (e.g., HAVING
log_stddev_base > 0) so rows with zero baseline variance are excluded from
anomaly scoring; modify the z_score calculation and/or the query's filter to
reference the same symbols (z_score, log_stddev_base, log_avg_curr,
log_avg_base) so downstream thresholding isn't skewed by tiny epsilon values.
- Around line 89-90: The migration creates ai_analyst_user with IDENTIFIED WITH
no_password, but syncDatabaseRoles() (called from instrumentation.register())
immediately sets a sha256_password using the required env var
AI_ANALYST_CLICKHOUSE_PASSWORD; update project docs or deployment notes to
recommend adding ClickHouse <networks> restrictions to limit access to
ai_analyst_user from trusted internal hosts as a defense-in-depth measure and
mention that syncDatabaseRoles() secures the account at startup via ALTER USER
... IDENTIFIED WITH sha256_password.
In `@apps/monitor-app/scripts/seed-demo-data.mjs`:
- Around line 587-633: seedAnomalyTestPattern currently uses the module-level
rng which yields non-deterministic sequences depending on prior calls; change
seedAnomalyTestPattern to use a local, deterministic RNG by either (a) accepting
an optional rng parameter (e.g., seedAnomalyTestPattern(client, projectId, rng))
and use that instead of the module-level rng, or (b) instantiate a new seeded
RNG at the start of the function (seeded from a provided seed or from a
deterministic value like projectId + currentHourMark) and replace all uses of
the module-level rng in this function with that local generator; ensure callers
(tests and top-level seed runner) pass a fixed seed or RNG when reproducibility
is required and adjust any references to buildEvents if it should share or
accept the same seeded RNG.
- Around line 614-616: The loop that sets recordedAt in the for loop (using
currentHourMark + i * 60_000) can produce future timestamps; change the
timestamp calculation to cap events at now() so no recordedAt is later than the
current wall-clock time—compute an eventDate = new
Date(Math.min(currentHourMark.getTime() + i * 60_000, Date.now())), then call
formatDateTime64Utc(eventDate) when building recordedAt; update the code in the
same loop where sessionId and recordedAt are created (the for loop using
currentHourMark, recordedAt, and randomUUID()).
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.github/workflows/check.ymlapps/monitor-app/clickhouse/migrations/007_anomaly_detection.sqlapps/monitor-app/scripts/seed-demo-data.mjsapps/monitor-app/src/app/server/lib/clickhouse/bootstrap.tsapps/monitor-app/src/test/anomaly-detection.test.tsapps/monitor-app/src/test/clickhouse-test-utils.tsdocker/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (4)
- docker/docker-compose.yml
- apps/monitor-app/src/test/clickhouse-test-utils.ts
- apps/monitor-app/src/test/anomaly-detection.test.ts
- apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts
📜 Review details
🔇 Additional comments (5)
.github/workflows/check.yml (1)
96-100: Good addition — anomaly test step is wired correctly.The new step follows the existing monitor test pattern, is properly gated, and runs with CI mode enabled.
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql (4)
1-15: LGTM —cwv_stats_hourlyschema is well-structured.Good use of
AggregatingMergeTreewith state columns, monthly partitioning, and a 30-day TTL for automated data lifecycle management. The ORDER BY key is comprehensive and aligns with the materialized view's GROUP BY.
17-30: LGTM — Materialized view correctly populates aggregate states with log-transform.The
log1ptransform is a solid choice for handling the right-skewed distribution typical of web performance metrics.
60-73:processed_anomaliesnow includesdevice_type— looks correct.The schema properly mirrors the identity dimensions from
v_cwv_anomalies.ReplacingMergeTree(updated_at)withORDER BY (project_id, anomaly_id)provides clean upsert semantics.
32-58:varSampStableis a documented ClickHouse function and is available in the repository's pinned version (25.8). No action needed on this function choice, but verify all environments consistently use the specified ClickHouse version.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql`:
- Around line 70-73: The ReplacingMergeTree version column uses second-precision
DateTime which can cause nondeterministic replacements; update the
processed_anomalies table by changing the updated_at column declaration from
DateTime DEFAULT now() to DateTime64(3) DEFAULT now64(3) and keep
ReplacingMergeTree(updated_at) so it uses sub-second precision; similarly, plan
a follow-up to change other ReplacingMergeTree tables (projects, users, session,
account, verification) in migrations 001_init.sql and 002_auth.sql to use
DateTime64(3)/now64(3) for consistency.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: check_pull_request
🔇 Additional comments (3)
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql (3)
1-30: Schema + MV aggregation layout looks solid.The
AggregatingMergeTreedesign and hourly rollup state functions are well-structured for anomaly detection workloads, and Line 15 keeps storage bounded.
32-59: Anomaly windowing and collision-safe ID construction look good.Using delimiters in
anomaly_id(Line 39) and excluding the immediate prior hour from baseline (Lines 35-46) is a good improvement for signal stability.
75-93: Profile/role/user provisioning is clear and idempotent.The setup is easy to reason about and migration-safe (
IF NOT EXISTS+ explicit grants/default role/profile).
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql
Outdated
Show resolved
Hide resolved
04efaf2 to
eb4ebb8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql`:
- Around line 17-30: The materialized view mv_cwv_stats_hourly is created
without populating existing rows so historical cwv_events won't be reflected in
cwv_stats_hourly and v_cwv_anomalies (which requires baseline_n >= 100) will
remain inactive; fix by adding an explicit backfill immediately after the CREATE
MATERIALIZED VIEW: run an INSERT INTO cwv_stats_hourly selecting project_id,
route, device_type, metric_name, toStartOfHour(recorded_at) AS hour,
avgState(log1p(metric_value)) AS avg_state,
varSampStableState(log1p(metric_value)) AS var_state, countState() AS
count_state FROM cwv_events GROUP BY project_id, hour, route, device_type,
metric_name so the hourly aggregates are populated for historical data.
- Around line 89-91: The schema and startup hook must enforce and fail fast on
missing ai analyst password: update the environment schema for
AI_ANALYST_CLICKHOUSE_PASSWORD (the z.string() validator) to require at least
one character (e.g. .min(1)) so empty values are invalid, and change
syncDatabaseRoles() so it does not silently return when aiUser or aiPass are
missing — instead log a clear error and throw (or cause startup to fail) so
credential rotation cannot be skipped; reference AI_ANALYST_CLICKHOUSE_PASSWORD
and the syncDatabaseRoles() function to locate the changes.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: check_pull_request
🔇 Additional comments (1)
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql (1)
38-40:anomaly_idhashing is now collision-safe across field boundaries.Using explicit separators in
concat(...)removes ambiguous concatenation cases and is a solid fix.
eb4ebb8 to
7870b65
Compare
7870b65 to
cf80433
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql`:
- Line 10: The declared column type count_state is currently
AggregateFunction(count, UInt64) which mismatches the no-argument countState()
used elsewhere; change the column type to AggregateFunction(count) to match
countState() output (or alternatively update the countState() calls to
countState(<UInt64_column>) if you intended a typed count), and ensure
subsequent usages like countMergeIf(count_state, ...) in the anomalies view
operate on the corrected AggregateFunction(count) state.
In `@apps/monitor-app/scripts/seed-demo-data.mjs`:
- Around line 589-626: The bug is that now.setMinutes(0,0,0) mutates now so
minutesPastHour later is always 0; fix by snapshotting minutesPastHour before
you floor the hour or by creating a copy when making currentHourMark. For
example, call const minutesPastHour = now.getMinutes() before touching now, or
build currentHourMark from a copy (const currentHourMark = new
Date(now.getTime()); currentHourMark.setMinutes(0,0,0,0)) so setMinutes is
applied to the copy and minutesPastHour and intervalMs calculations use the
original now; update references to currentHourMark/now/intervalMs accordingly
(symbols: now, currentHourMark, minutesPastHour, intervalMs, setMinutes).
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sqlapps/monitor-app/scripts/seed-demo-data.mjsapps/monitor-app/src/env.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/monitor-app/src/env.ts
📜 Review details
🔇 Additional comments (3)
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql (3)
17-30: LGTM — MV definition is consistent with the table schema (modulo thecount_statetype noted above).The backfill
INSERTat lines 32–44 uses the same column expressions as the MV, ensuring historical data is seeded intocwv_stats_hourlybefore anomaly detection goes live. The 7-day backfill window aligns withbaseline_start.
46-72: LGTM — anomaly detection view logic is sound.The z-score formula correctly uses log-transformed values, the gap hour exclusion between the current and baseline windows avoids double-counting partial hours, and the
IF(log_stddev_base = 0, 0.00001, log_stddev_base)guard prevents division by zero. The\0-separatedanomaly_idhash (line 52–54) and theHAVING sample_size >= 20 AND baseline_n >= 100quality gate are all correct.
74-107: LGTM — schema and access-control setup looks correct.
processed_anomaliesnow carriesdevice_type(aligningORDER BYkey withv_cwv_anomaliesanomaly identity), usesDateTime64(3)for sub-secondReplacingMergeTreeversioning, andai_analyst_useris created withno_passwordso there is no committed credential. Role grants cover the minimum surface needed.
cf80433 to
a32ca20
Compare
Summary by CodeRabbit
New Features
Tests
Chores