Skip to content

feat: add anomaly detection pipeline#120

Open
Levosilimo wants to merge 9 commits intomainfrom
cwv-115-anomaly-pipeline
Open

feat: add anomaly detection pipeline#120
Levosilimo wants to merge 9 commits intomainfrom
cwv-115-anomaly-pipeline

Conversation

@Levosilimo
Copy link
Collaborator

@Levosilimo Levosilimo commented Mar 1, 2026

Summary by CodeRabbit

  • New Features

    • Hourly anomaly detection worker with automated alerts, AI summaries, and investigation links.
    • Notifications to Slack and Microsoft Teams.
    • ClickHouse anomaly detection pipeline and data model.
  • Tests

    • New integration/end-to-end tests for anomaly detection, AI bridge, notifications and repositories.
    • CI: conditional step to run anomaly detection tests when relevant changes occur.
  • Documentation

    • AI Agent guide and architecture docs updated for anomaly workflows.
  • Chores

    • Local/dev/prod configs, Docker compose and example env updated; demo seed data and test setups added.

@vercel
Copy link

vercel bot commented Mar 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
next-cwv-monitor-client-app Ready Ready Preview, Comment Mar 4, 2026 8:59am
next-cwv-monitor-monitor-app Ready Ready Preview, Comment Mar 4, 2026 8:59am

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an anomaly detection subsystem: new anomaly-worker service, ClickHouse schema/migrations and AI analyst access, AI Bridge, multi-channel notifications, repository APIs, Docker/dev tooling, extensive integration tests, and a conditional CI step to run anomaly tests.

Changes

Cohort / File(s) Summary
Anomaly Worker App
apps/anomaly-worker/.env.example, apps/anomaly-worker/package.json, apps/anomaly-worker/src/env.ts, apps/anomaly-worker/src/index.ts, apps/anomaly-worker/tsconfig.json, apps/anomaly-worker/.gitignore
New Node.js anomaly-worker app: env schema, runtime entrypoint (cron hourly, graceful shutdown), build/dev scripts, and .env ignored.
ClickHouse Schema & Migrations
apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql, apps/monitor-app/clickhouse/catalog.yml
Adds hourly aggregating table, materialized view, anomaly scoring view, processed_anomalies table, ai_analyst profile/role/user, and catalog definitions/queries.
ClickHouse Client & Bootstrap
apps/monitor-app/src/app/server/lib/clickhouse/client.ts, apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts
Adds dual client support (standard + AI analyst), lazy sql/aiSql proxies, test-aware connection resolution, and syncDatabaseRoles to set ai analyst password.
Data Repositories & Schema Types
apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts, apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts, apps/monitor-app/src/app/server/lib/clickhouse/repositories/__tests__/anomalies-repository.integration.test.ts, apps/monitor-app/src/app/server/lib/clickhouse/schema.ts
New anomaly repositories (querying v_cwv_anomalies, unprocessed set), processed_anomalies insert/claim logic, and new TS types (AnomalyRow, Insertable/ProcessedAnomalyRow) with integration tests.
AI Bridge
apps/monitor-app/src/app/server/domain/ai-bridge/service.ts, apps/monitor-app/src/app/server/domain/ai-bridge/__tests__/ai-bridge-service.integration.test.ts
New AiBridgeService exposing executeSql, getSchemaCatalog, getAnomalyContext, getSummary, and helpers; reads catalog and composes anomaly context; covered by integration tests.
Notification System
apps/monitor-app/src/app/server/domain/notifications/types.ts, apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts, apps/monitor-app/src/app/server/domain/notifications/service.ts, apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts
Adds notification types, dispatcher (Slack/Teams payload builders + delivery), NotificationsService (batching, claim+notify, persist processed_anomalies), and integration tests mocking webhooks/repos.
Monitor App Updates
apps/monitor-app/src/env.ts, apps/monitor-app/.env.example, apps/monitor-app/.env.ci, apps/monitor-app/.env.test, apps/monitor-app/package.json, apps/monitor-app/scripts/seed-demo-data.mjs, apps/monitor-app/src/instrumentation.ts, apps/monitor-app/src/test/performance-guardrails.test.ts
Adds AI analyst env vars and webhook placeholders, extends env runtime/test logic, adds seedAnomalyTestPattern, dependency updates, guards provisionInitialUser/syncDatabaseRoles to Node runtime, and minor test adjustments.
Tests & Test Infrastructure
apps/monitor-app/src/test/global-setup.ts, apps/monitor-app/src/test/clickhouse-test-utils.ts, apps/monitor-app/src/test/anomaly-detection.integration.test.ts, apps/monitor-app/vitest.integration.config.ts, apps/monitor-app/vitest.integration.setup.ts
Global ClickHouse test setup/teardown, optimizeAnomalies helper, deterministic anomaly integration tests, vitest globalSetup and alias additions, and test env wiring.
Scripts & Seed Data
apps/monitor-app/scripts/seed-demo-data.mjs
Adds seedAnomalyTestPattern to generate anomaly-pattern CWV events for tests/demo and run OPTIMIZE.
Docker & Compose
docker/anomaly-worker.Dockerfile, docker/docker-compose.yml, docker/docker-compose.dev.yml, docker/docker-compose.build.yml, docker/monitor-app.dev.Dockerfile, docker/monitor-app.prod.Dockerfile, setup.sh
Adds multi-stage Dockerfile for anomaly-worker, new cwv-worker service (anomaly profile) in compose files, setup service for dev installs, env propagation for ai analyst creds and webhooks, and setup.sh prompts.
CI
.github/workflows/check.yml
Adds conditional "Run monitor anomaly detection tests" CI step gated by monitor_changed to run anomaly tests.
Docs
ARCHITECTURE.md, apps/monitor-app/AI_AGENT_GUIDE.md
Documents Anomaly Worker, worker subgraph, data flow, AI Agent integration, security model, and usage patterns.

Sequence Diagram(s)

sequenceDiagram
    actor CronScheduler as Cron Scheduler
    participant Worker as Anomaly Worker
    participant NotifSvc as NotificationsService
    participant AnomaliesRepo as Anomalies Repository
    participant AiBridge as AI Bridge Service
    participant ProjectsRepo as Projects Repository
    participant Dispatcher as Notification Dispatcher
    participant ClickHouse as ClickHouse
    participant Webhook as Slack/Teams

    CronScheduler->>Worker: hourly trigger (30th minute)
    Worker->>NotifSvc: notifyNewAnomalies()
    alt webhooks configured
        NotifSvc->>AnomaliesRepo: getUnprocessedAnomalies()
        AnomaliesRepo->>ClickHouse: query v_cwv_anomalies excluding processed
        ClickHouse-->>AnomaliesRepo: anomalies[]
        AnomaliesRepo-->>NotifSvc: anomalies[]
        loop per anomaly
            NotifSvc->>ProjectsRepo: getProjectById(projectId)
            ProjectsRepo->>ClickHouse: select project
            ClickHouse-->>ProjectsRepo: project
            ProjectsRepo-->>NotifSvc: project

            NotifSvc->>AiBridge: getSummary / getAnomalyContext
            AiBridge->>ClickHouse: contextual SQL
            ClickHouse-->>AiBridge: context data
            AiBridge-->>NotifSvc: ai summary

            NotifSvc->>Dispatcher: dispatch(notification)
            Dispatcher->>Webhook: POST payload
            Webhook-->>Dispatcher: 200/err
            Dispatcher-->>NotifSvc: delivery result

            NotifSvc->>AnomaliesRepo: insertProcessedAnomaly(status='notified')
            AnomaliesRepo->>ClickHouse: insert processed_anomalies
            ClickHouse-->>AnomaliesRepo: ok
        end
    else no webhooks
        NotifSvc-->>Worker: skip (no targets)
    end
    Worker-->>CronScheduler: cycle complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • kasin-it
  • iipanda

Poem

🐇 I hop at thirty past each hour,
I scan for z-scores with gentle power,
Slack and Teams get tidy cards,
Baselines, trends, and helpful parts,
A rabbit's cheer for alerts that flower.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add anomaly detection pipeline' clearly and concisely summarizes the main change across the entire changeset, which introduces a comprehensive anomaly detection system including database tables, worker service, notification dispatch, and integration tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cwv-115-anomaly-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/monitor-app/package.json (1)

5-18: ⚠️ Potential issue | 🔴 Critical

Add test:anomaly script to match CI invocation.

The new workflow step calls pnpm --filter cwv-monitor-app test:anomaly, but this script is not defined here, so CI will fail on monitor changes.

💡 Proposed fix
   "scripts": {
     "dev": "next dev -p 3000",
     "build": "eslint && next build",
     "start": "next start",
     "lint": "eslint",
     "storybook": "storybook dev -p 6006",
     "build-storybook": "storybook build",
     "clickhouse:migrate": "node ./scripts/run-clickhouse-migrate.mjs",
     "clickhouse:migrate:dev": "dotenvx run -- node ./scripts/run-clickhouse-migrate.mjs",
     "seed:demo": "node ./scripts/seed-demo-data.mjs",
     "seed:cwv-events": "node ./scripts/seed-demo-data.mjs --custom-events-only",
     "test:integration": "vitest run --config vitest.integration.config.ts",
-    "test:perf": "vitest run ./src/test/performance-guardrails.test.ts --config vitest.performance.config.ts"
+    "test:perf": "vitest run ./src/test/performance-guardrails.test.ts --config vitest.performance.config.ts",
+    "test:anomaly": "vitest run ./src/test/anomaly-detection.integration.test.ts --config vitest.integration.config.ts"
   },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/package.json` around lines 5 - 18, Add a "test:anomaly" npm
script to the package.json scripts section so CI's pnpm --filter cwv-monitor-app
test:anomaly succeeds; mirror the pattern used by existing test scripts (e.g.,
"test:integration" / "test:perf") and point it to the anomaly test entry and
config used by your repo (for example using vitest with the anomaly test file
and/or anomaly-specific vitest config). Ensure the new script name is exactly
"test:anomaly" so it matches the CI invocation.
🧹 Nitpick comments (8)
apps/monitor-app/.env.test (1)

10-11: Optional: clean dotenv-linter warnings in test env file.

You can remove the extra blank line and reorder keys in this ClickHouse block to satisfy UnorderedKey and ExtraBlankLine.

♻️ Suggested cleanup
 CLICKHOUSE_HOST=127.0.0.1
 CLICKHOUSE_PORT=18123
-CLICKHOUSE_DB=cwv_monitor_test
-CLICKHOUSE_USER=default
-CLICKHOUSE_PASSWORD=secret
-AI_ANALYST_CLICKHOUSE_USER=ai_analyst_user
 AI_ANALYST_CLICKHOUSE_PASSWORD=ai_analyst_password
+AI_ANALYST_CLICKHOUSE_USER=ai_analyst_user
+CLICKHOUSE_DB=cwv_monitor_test
+CLICKHOUSE_PASSWORD=secret
+CLICKHOUSE_USER=default
@@
 INITIAL_USER_PASSWORD=password1234
 INITIAL_USER_NAME=Initial User
-
 # Logging
 LOG_LEVEL=debug

Also applies to: 22-22

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/.env.test` around lines 10 - 11, Remove the extra blank line
in the ClickHouse env block and reorder the keys so they follow the project's
canonical ordering (place AI_ANALYST_CLICKHOUSE_PASSWORD directly after
AI_ANALYST_CLICKHOUSE_USER or vice-versa to match other .env entries); also
apply the same fix to the other ClickHouse entry referenced (the occurrence at
the other block) so dotenv-linter warnings UnorderedKey and ExtraBlankLine are
resolved.
apps/monitor-app/.env.ci (1)

10-11: Optional: reorder keys to keep dotenv-linter quiet.

If dotenv-linter is enforced in CI, reorder these keys to remove UnorderedKey warnings.

♻️ Suggested ordering tweak
 CLICKHOUSE_HOST=127.0.0.1
 CLICKHOUSE_PORT=18123
-CLICKHOUSE_DB=cwv_monitor_test
-CLICKHOUSE_USER=default
-CLICKHOUSE_PASSWORD=secret
-AI_ANALYST_CLICKHOUSE_USER=ai_analyst_user
 AI_ANALYST_CLICKHOUSE_PASSWORD=ai_analyst_password
+AI_ANALYST_CLICKHOUSE_USER=ai_analyst_user
+CLICKHOUSE_DB=cwv_monitor_test
+CLICKHOUSE_PASSWORD=secret
+CLICKHOUSE_USER=default
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/.env.ci` around lines 10 - 11, Reorder the two dotenv keys
so they follow the project's dotenv-linter ordering (e.g., alphabetically) to
avoid UnorderedKey warnings: place AI_ANALYST_CLICKHOUSE_PASSWORD and
AI_ANALYST_CLICKHOUSE_USER in the correct sequence (swap their lines) so the
keys are sorted consistently with the rest of the file.
apps/anomaly-worker/package.json (1)

11-12: Keep dev env-loading behavior consistent with start.

start preloads dotenv/config, but dev doesn’t. That can cause inconsistent runtime validation and local failures.

💡 Suggested script alignment
-    "dev": "tsx src/index.ts | pino-pretty",
+    "dev": "cross-env NODE_OPTIONS=\"-r dotenv/config\" tsx src/index.ts | pino-pretty",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/anomaly-worker/package.json` around lines 11 - 12, The dev script in
package.json is missing the dotenv preload present in start, causing
inconsistent env loading; update the "dev" script (the package.json "dev" entry)
to preload dotenv the same way as "start" (e.g., include cross-env
NODE_OPTIONS="-r dotenv/config" before tsx src/index.ts and keep the pino-pretty
pipe) so both start and dev environments load .env consistently.
apps/monitor-app/src/test/anomaly-detection.integration.test.ts (1)

7-7: Inconsistent import path for seed utility.

This file uses a relative import path while other test files use the scripts alias. For consistency across the test suite, consider using the alias.

♻️ Proposed fix
-import { seedAnomalyTestPattern } from "../../scripts/seed-demo-data.mjs";
+import { seedAnomalyTestPattern } from "scripts/seed-demo-data.mjs";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/test/anomaly-detection.integration.test.ts` at line 7,
Replace the relative import of seedAnomalyTestPattern with the project
alias-based import used by other tests: update the import statement that
references seedAnomalyTestPattern (from "../../scripts/seed-demo-data.mjs") to
use the scripts alias (e.g., import from "scripts/seed-demo-data.mjs" or
"scripts/seed-demo-data" consistent with the codebase import style) so the test
uses the same module resolution as the rest of the suite.
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (3)

99-109: Timestamp fields should use Date | string for consistency with existing types.

The notified_at and updated_at fields use Date only, but other timestamp fields in this file (e.g., created_at, updated_at in ProjectRow, recorded_at in CwvEventRow) use Date | string. ClickHouse often returns timestamps as strings, so this pattern ensures proper handling.

Additionally, device_type should use DeviceType as noted above.

♻️ Suggested change
 export type InsertableProcessedAnomalyRow = {
   anomaly_id: string;
   project_id: string;
   metric_name: string;
   route: string;
-  device_type: string;
+  device_type: DeviceType;
   last_z_score: number;
-  notified_at?: Date;
+  notified_at?: Date | string;
   status?: "new" | "notified" | "acknowledged" | "resolved";
-  updated_at?: Date;
+  updated_at?: Date | string;
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/lib/clickhouse/schema.ts` around lines 99 -
109, Change the InsertableProcessedAnomalyRow type so its timestamp fields match
the project's convention and its device type uses the shared enum: update
notified_at and updated_at from Date to Date | string, and change device_type
from string to DeviceType; modify the InsertableProcessedAnomalyRow definition
accordingly to reference DeviceType and Date | string for those fields.

111-115: Update timestamp types to Date | string for consistency.

Same pattern applies here - the required notified_at and updated_at fields should use Date | string to match existing conventions.

♻️ Suggested change
 export type ProcessedAnomalyRow = InsertableProcessedAnomalyRow & {
-  notified_at: Date;
+  notified_at: Date | string;
   status: "new" | "notified" | "acknowledged" | "resolved";
-  updated_at: Date;
+  updated_at: Date | string;
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/lib/clickhouse/schema.ts` around lines 111 -
115, The ProcessedAnomalyRow type currently declares notified_at and updated_at
as Date but should allow Date | string for consistency; update the
ProcessedAnomalyRow declaration (and its exported type) so that notified_at:
Date | string and updated_at: Date | string, leaving status and other fields
unchanged (refer to the ProcessedAnomalyRow type and
InsertableProcessedAnomalyRow symbol to locate the type).

85-97: Consider using DeviceType for device_type field for consistency.

The device_type field is typed as string, but other row types in this file (e.g., CwvEventRow, CustomEventRow, CwvDailyAggregateRow) use the imported DeviceType type. Using the same type would provide better type safety and consistency across the schema.

♻️ Suggested change
 export type AnomalyRow = {
   anomaly_id: string;
   project_id: string;
   route: string;
   metric_name: string;
-  device_type: string;
+  device_type: DeviceType;
   detection_time: Date | string;
   current_avg_raw: number;
   baseline_avg_raw: number;
   z_score: number;
   sample_size: number;
   baseline_n: number;
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/lib/clickhouse/schema.ts` around lines 85 -
97, AnomalyRow currently types device_type as string—change it to use the
existing DeviceType type for consistency and stronger typing by updating the
field to device_type: DeviceType in the AnomalyRow type definition; if
DeviceType is not already imported in this file, add the appropriate import for
DeviceType (same source used by CwvEventRow/CustomEventRow/CwvDailyAggregateRow)
so the change compiles.
apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (1)

46-54: Consider using a more robust path resolution for the catalog file.

process.cwd() may not reliably resolve to the expected directory in all deployment contexts (e.g., containerized environments, serverless). Consider using __dirname or an environment variable to configure the path, or bundling the catalog content at build time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/domain/ai-bridge/service.ts` around lines 46
- 54, The getSchemaCatalog function currently builds catalogPath with
process.cwd(), which can be unreliable; change it to resolve the file via a
stable base (e.g., derive from __dirname of this module or read a configured
environment variable like CATALOG_PATH) and fall back to a packaged/bundled
asset if present; update the logic in getSchemaCatalog to check
process.env.CATALOG_PATH first, then resolve relative to __dirname (e.g.,
path.resolve(__dirname, '...', 'clickhouse', 'catalog.yml')), and ensure error
handling/logging for both missing env and missing file remains intact so the
function still returns the default string when unreadable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/anomaly-worker/.env`:
- Line 10: Remove the trailing whitespace after the environment variable value
for INITIAL_USER_PASSWORD in the .env file; edit the line containing
"INITIAL_USER_PASSWORD=password " (the INITIAL_USER_PASSWORD entry) so it ends
with no extra spaces or tabs (i.e., "INITIAL_USER_PASSWORD=password") to satisfy
dotenv-linter.
- Around line 5-11: Remove all real credential values from the committed .env:
replace values for CLICKHOUSE_PASSWORD, AI_ANALYST_CLICKHOUSE_PASSWORD,
BETTER_AUTH_SECRET, INITIAL_USER_PASSWORD (and any other secrets) with
non-sensitive placeholders and keep only default placeholders for
INITIAL_USER_EMAIL and INITIAL_USER_NAME if needed; add .env to .gitignore so it
is not tracked; create or update a .env.example with the placeholder keys and
example values; and document in README or deployment docs that real secrets
should be provided via environment variables or a secret manager rather than
committing them.

In `@apps/anomaly-worker/.env.example`:
- Around line 1-15: The .env.example is missing required bootstrap variables
that are strictly validated in src/env.ts; add the following entries to
.env.example: AI_ANALYST_CLICKHOUSE_PASSWORD (placeholder, non-empty),
BETTER_AUTH_SECRET (placeholder, min-length matching src/env.ts),
INITIAL_USER_EMAIL (placeholder in valid email format), INITIAL_USER_NAME
(placeholder, non-empty), and INITIAL_USER_PASSWORD (placeholder, min-length
matching src/env.ts). Ensure the placeholder values conform to the validation
rules enforced in src/env.ts (email format for INITIAL_USER_EMAIL and minimum
lengths for secrets/passwords) so developers copying .env.example won’t fail
validation at startup.

In `@apps/anomaly-worker/src/index.ts`:
- Around line 20-29: Replace the immediate-exit SIGTERM handler and add
overlap/shutdown guards so scheduled cycles can't run concurrently and shutdown
waits for the current cycle: introduce module-level flags (e.g., isRunning and
isShuttingDown) and update cron.schedule to call a wrapper that returns early if
isRunning or isShuttingDown; modify runDetectionCycle to set isRunning=true at
start and isRunning=false in a finally block (or promise resolution path);
change the process.on('SIGTERM') handler to set isShuttingDown=true, stop the
cron job (or prevent new wrapper-invocations), wait for the current
runDetectionCycle to finish with a bounded timeout, log appropriate messages via
logger.info/logger.error, then exit cleanly; use the existing symbols
cron.schedule, runDetectionCycle, logger, and process.on('SIGTERM') to locate
and apply the changes.

In `@apps/monitor-app/AI_AGENT_GUIDE.md`:
- Around line 85-88: Update the "AI Domain Logic" reference in AI_AGENT_GUIDE.md
to point to the ai-bridge implementation instead of the ai one: locate the "AI
Domain Logic" entry (the line that currently references service.ts) and change
its referenced directory from ai to ai-bridge so it correctly points at the
actual implementation file (service.ts) under the ai-bridge module.
- Line 62: The documentation import is pointing to the wrong module: update the
import that brings in aiBridge so it references the actual service module (use
the ai-bridge service module name) instead of the old ai/service path; modify
the import line that currently imports aiBridge to import from the ai-bridge
service module so it matches the real export and builds correctly.

In `@apps/monitor-app/clickhouse/catalog.yml`:
- Around line 66-72: The catalog entry for processed_anomalies is missing
columns that are actually written; update the processed_anomalies columns block
to include the repository fields (add project_id, metric_name, route,
device_type, last_z_score, updated_at) alongside the existing anomaly_id,
status, and notified_at and assign appropriate ClickHouse types (e.g., String,
DateTime, Float64 or Decimal for scores, Enum8 for status) so the catalog
matches the actual writes and prevents invalid generated SQL.

In `@apps/monitor-app/scripts/seed-demo-data.mjs`:
- Around line 588-592: The bug is that now is mutated when creating
currentHourMark (using now.setMinutes(...)) which causes later computed
recordedAtDate values to always fall before currentHourMark and trigger the
clamp branch; fix by creating currentHourMark from a copy of now (e.g., new
Date(now) or Date constructor) instead of calling now.setMinutes on the
original, keep minutesPastHour calculated from now.getMinutes(), and ensure any
subsequent recordedAtDate math uses the unmutated now value so the intended
interval distribution is preserved (update references to now, minutesPastHour,
currentHourMark, and recordedAtDate accordingly).

In `@apps/monitor-app/src/app/server/domain/ai-bridge/service.ts`:
- Around line 40-42: The catch block currently logs the full SQL via
logger.error({ err: error, query: String(query) }, ...); change this to avoid
recording the entire query by replacing String(query) with either a truncated
snippet (e.g., first N chars), a stable query identifier, or a hash/CRC of the
query string; update the logger.error call in the same catch block (where
logger.error and the query variable are referenced) to include only the chosen
safe representation plus contextual info (error and query id/hash) so sensitive
parameter values are not written to logs.

In `@apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts`:
- Around line 9-28: The send method currently swallows transport errors by using
Promise.allSettled and always resolving; change the contract of
send(notification: NotificationPayload) to return a delivery result (e.g.,
Promise<DeliveryResult> or a boolean) that reports per-transport success/failure
and does not hide failures: call sendToSlack and sendToTeams, await their
promises, collect each result or catch each error, and populate a DeliveryResult
{ slack?: { success: boolean; error?: string }, teams?: { success: boolean;
error?: string } }; if you prefer to fail fast, use Promise.all so the method
rejects on first transport error; otherwise use Promise.allSettled but convert
settlements into the DeliveryResult and return it (and ensure callers of send
are updated to handle the new return type). Ensure symbols referenced: send,
sendToSlack, sendToTeams, NotificationPayload, and update any callers to consume
the DeliveryResult.

In `@apps/monitor-app/src/app/server/domain/notifications/service.ts`:
- Around line 25-26: The current code fan-out builds unbounded Promises by
mapping anomalies to this.processAndNotify and awaiting Promise.allSettled,
which can overload AI/webhook dependencies; change it to a bounded concurrency
approach (e.g., use a fixed concurrency limiter or process in batches/sequential
loop) so only N concurrent this.processAndNotify calls run at once. Locate the
anomalies.map => this.processAndNotify and the Promise.allSettled usage and
replace with a controlled executor (batch loop or a p-limit style limiter) or a
simple for-await sequential loop to limit concurrent executions and then await
results (using Promise.allSettled per batch if needed).

In `@apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts`:
- Around line 10-12: Replace the single ALTER-only statement in
syncDatabaseRoles with a two-step create-then-alter to ensure the AI user exists
and then set credentials: execute a CREATE USER IF NOT EXISTS using
sql.identifier(aiUser) and aiPass, followed by an unconditional ALTER USER
sql.identifier(aiUser) IDENTIFIED WITH sha256_password BY aiPass so credentials
are always applied; update the SQL calls that currently use the snippet ALTER
USER IF EXISTS ${sql.identifier(aiUser)} IDENTIFIED WITH sha256_password BY
${aiPass} to perform these two statements in sequence.

In
`@apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts`:
- Around line 25-27: The catch blocks in AnomaliesRepository that currently call
logger.error("Failed to get new anomalies") (and the similar messages around the
other two catches) drop the thrown error; update those logger.error calls in
anomalies-repository.ts to include the caught error object/stack so the log
contains the exception details before rethrowing. Locate the catch handlers
inside the AnomaliesRepository methods (the one that logs "Failed to get new
anomalies" and the other two catch blocks referenced) and pass the error into
logger.error (or include it as structured metadata) so the original error is
preserved in the logs, then rethrow as currently done.

In `@apps/monitor-app/src/instrumentation.ts`:
- Around line 7-8: Wrap the call to syncDatabaseRoles() in a try/catch inside
the startup sequence (where provisionInitialUser() and syncDatabaseRoles() are
called) so failures from the external ClickHouse command in register() do not
block app startup; on error log a clear message including the exception (using
the module's logger or console.error) and allow startup to continue (or set a
non-fatal health flag) rather than letting the exception propagate. Ensure you
reference and protect the await syncDatabaseRoles() invocation and keep
provisionInitialUser() behavior unchanged.

In `@apps/monitor-app/src/test/anomaly-detection.integration.test.ts`:
- Around line 85-98: The insert into processed_anomalies in the test is missing
the required device_type field per InsertableProcessedAnomalyRow; update the
values object used in directClient.insert (the one with anomaly_id:
detected.anomaly_id, project_id: PROJECT_ID, metric_name: "LCP", route:
"/checkout", last_z_score: detected.z_score, status: "notified") to include
device_type: "desktop" so the seeded row matches the expected type.

In `@apps/monitor-app/src/test/clickhouse-test-utils.ts`:
- Around line 143-155: The optimizeAnomalies function currently may run OPTIMIZE
even if cwv_stats_hourly never populated; modify optimizeAnomalies to track the
last retrieved count (declare count/raw outside the loop), after the deadline
check if count === 0 and throw a descriptive error (including the table name and
that the deadline expired) instead of proceeding to await sqlClient`OPTIMIZE
TABLE cwv_stats_hourly FINAL`.command(); this prevents silent success when there
are zero source rows and makes test failures explicit.

In `@apps/monitor-app/src/test/global-setup.ts`:
- Around line 9-70: The setup() function can leak the started GenericContainer
(container) if any awaited step (runClickHouseMigrations, adminClient.query,
etc.) throws; wrap the body after container = instance in a try/catch/finally so
that on any error you stop/cleanup the container and rethrow the error.
Specifically: after assigning container in setup(), add a try block around the
remaining logic and in a catch/finally call container.stop() (or the existing
teardown logic) and ensure adminClient is closed if it was created; rethrow the
original error so tests still fail.

In `@apps/monitor-app/vitest.integration.setup.ts`:
- Around line 19-27: The current block that reads .vitest-ch-config.json
(configPath / config) writes config.host and config.port directly into
process.env.TEST_CH_HOST / TEST_CH_PORT without validating types; update the try
block to validate the parsed config shape (ensure config is an object,
config.host is a non-empty string, and config.port is a number or numeric
string) and only assign process.env.TEST_CH_HOST and process.env.TEST_CH_PORT
when validations pass, otherwise log a clear warning and skip mutating
process.env (or set safe defaults) to avoid poisoning the test environment.

In `@ARCHITECTURE.md`:
- Around line 40-43: The diagram currently points the "marks as notified" (and
the other notification-state arrow) to the Raw node; update the diagram so the
Poller arrows that represent notification state mutations ("marks as notified")
target the processed_anomalies state store instead of Raw (i.e., replace Raw
with processed_anomalies for those arrows), and ensure Poller -->|"fetches
project info"| still targets the correct node if it should also be
processed_anomalies; update any duplicate arrow entries (the ones referenced
around lines 42 and 93-94) so diagram labels and targets match the documented
processed_anomalies state.

In `@docker/docker-compose.yml`:
- Around line 116-121: Remove the insecure hardcoded fallbacks from the
cwv-worker environment by eliminating the default values for BETTER_AUTH_SECRET,
INITIAL_USER_EMAIL, INITIAL_USER_PASSWORD, INITIAL_USER_NAME, SLACK_WEBHOOK_URL
and TEAMS_WEBHOOK_URL in the docker-compose service definition (replace
${VAR:-default} with plain ${VAR} so the variables must be provided), or
alternatively set them to empty/null and add startup validation that exits if
BETTER_AUTH_SECRET or INITIAL_USER_PASSWORD (and webhook URLs if required) are
missing; update any deployment/docs to require these env vars and avoid shipping
secrets like "dev-secret-not-for-production-123" or placeholder URLs.

---

Outside diff comments:
In `@apps/monitor-app/package.json`:
- Around line 5-18: Add a "test:anomaly" npm script to the package.json scripts
section so CI's pnpm --filter cwv-monitor-app test:anomaly succeeds; mirror the
pattern used by existing test scripts (e.g., "test:integration" / "test:perf")
and point it to the anomaly test entry and config used by your repo (for example
using vitest with the anomaly test file and/or anomaly-specific vitest config).
Ensure the new script name is exactly "test:anomaly" so it matches the CI
invocation.

---

Nitpick comments:
In `@apps/anomaly-worker/package.json`:
- Around line 11-12: The dev script in package.json is missing the dotenv
preload present in start, causing inconsistent env loading; update the "dev"
script (the package.json "dev" entry) to preload dotenv the same way as "start"
(e.g., include cross-env NODE_OPTIONS="-r dotenv/config" before tsx src/index.ts
and keep the pino-pretty pipe) so both start and dev environments load .env
consistently.

In `@apps/monitor-app/.env.ci`:
- Around line 10-11: Reorder the two dotenv keys so they follow the project's
dotenv-linter ordering (e.g., alphabetically) to avoid UnorderedKey warnings:
place AI_ANALYST_CLICKHOUSE_PASSWORD and AI_ANALYST_CLICKHOUSE_USER in the
correct sequence (swap their lines) so the keys are sorted consistently with the
rest of the file.

In `@apps/monitor-app/.env.test`:
- Around line 10-11: Remove the extra blank line in the ClickHouse env block and
reorder the keys so they follow the project's canonical ordering (place
AI_ANALYST_CLICKHOUSE_PASSWORD directly after AI_ANALYST_CLICKHOUSE_USER or
vice-versa to match other .env entries); also apply the same fix to the other
ClickHouse entry referenced (the occurrence at the other block) so dotenv-linter
warnings UnorderedKey and ExtraBlankLine are resolved.

In `@apps/monitor-app/src/app/server/domain/ai-bridge/service.ts`:
- Around line 46-54: The getSchemaCatalog function currently builds catalogPath
with process.cwd(), which can be unreliable; change it to resolve the file via a
stable base (e.g., derive from __dirname of this module or read a configured
environment variable like CATALOG_PATH) and fall back to a packaged/bundled
asset if present; update the logic in getSchemaCatalog to check
process.env.CATALOG_PATH first, then resolve relative to __dirname (e.g.,
path.resolve(__dirname, '...', 'clickhouse', 'catalog.yml')), and ensure error
handling/logging for both missing env and missing file remains intact so the
function still returns the default string when unreadable.

In `@apps/monitor-app/src/app/server/lib/clickhouse/schema.ts`:
- Around line 99-109: Change the InsertableProcessedAnomalyRow type so its
timestamp fields match the project's convention and its device type uses the
shared enum: update notified_at and updated_at from Date to Date | string, and
change device_type from string to DeviceType; modify the
InsertableProcessedAnomalyRow definition accordingly to reference DeviceType and
Date | string for those fields.
- Around line 111-115: The ProcessedAnomalyRow type currently declares
notified_at and updated_at as Date but should allow Date | string for
consistency; update the ProcessedAnomalyRow declaration (and its exported type)
so that notified_at: Date | string and updated_at: Date | string, leaving status
and other fields unchanged (refer to the ProcessedAnomalyRow type and
InsertableProcessedAnomalyRow symbol to locate the type).
- Around line 85-97: AnomalyRow currently types device_type as string—change it
to use the existing DeviceType type for consistency and stronger typing by
updating the field to device_type: DeviceType in the AnomalyRow type definition;
if DeviceType is not already imported in this file, add the appropriate import
for DeviceType (same source used by
CwvEventRow/CustomEventRow/CwvDailyAggregateRow) so the change compiles.

In `@apps/monitor-app/src/test/anomaly-detection.integration.test.ts`:
- Line 7: Replace the relative import of seedAnomalyTestPattern with the project
alias-based import used by other tests: update the import statement that
references seedAnomalyTestPattern (from "../../scripts/seed-demo-data.mjs") to
use the scripts alias (e.g., import from "scripts/seed-demo-data.mjs" or
"scripts/seed-demo-data" consistent with the codebase import style) so the test
uses the same module resolution as the rest of the suite.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86009a7 and e3c083a.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (44)
  • .github/workflows/check.yml
  • ARCHITECTURE.md
  • apps/anomaly-worker/.env
  • apps/anomaly-worker/.env.example
  • apps/anomaly-worker/package.json
  • apps/anomaly-worker/src/env.ts
  • apps/anomaly-worker/src/index.ts
  • apps/anomaly-worker/tsconfig.json
  • apps/monitor-app/.env.ci
  • apps/monitor-app/.env.example
  • apps/monitor-app/.env.test
  • apps/monitor-app/AI_AGENT_GUIDE.md
  • apps/monitor-app/clickhouse/catalog.yml
  • apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql
  • apps/monitor-app/package.json
  • apps/monitor-app/scripts/seed-demo-data.mjs
  • apps/monitor-app/src/app/server/domain/ai-bridge/__tests__/ai-bridge-service.integration.test.ts
  • apps/monitor-app/src/app/server/domain/ai-bridge/service.ts
  • apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts
  • apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts
  • apps/monitor-app/src/app/server/domain/notifications/service.ts
  • apps/monitor-app/src/app/server/domain/notifications/types.ts
  • apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts
  • apps/monitor-app/src/app/server/lib/clickhouse/client.ts
  • apps/monitor-app/src/app/server/lib/clickhouse/repositories/__tests__/anomalies-repository.integration.test.ts
  • apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts
  • apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts
  • apps/monitor-app/src/app/server/lib/clickhouse/repositories/projects-repository.ts
  • apps/monitor-app/src/app/server/lib/clickhouse/schema.ts
  • apps/monitor-app/src/env.ts
  • apps/monitor-app/src/instrumentation.ts
  • apps/monitor-app/src/test/anomaly-detection.integration.test.ts
  • apps/monitor-app/src/test/clickhouse-test-utils.ts
  • apps/monitor-app/src/test/global-setup.ts
  • apps/monitor-app/src/test/performance-guardrails.test.ts
  • apps/monitor-app/vitest.integration.config.ts
  • apps/monitor-app/vitest.integration.setup.ts
  • docker/anomaly-worker.Dockerfile
  • docker/docker-compose.build.yml
  • docker/docker-compose.dev.yml
  • docker/docker-compose.yml
  • docker/monitor-app.dev.Dockerfile
  • docker/monitor-app.prod.Dockerfile
  • setup.sh
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: check_pull_request
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-02-27T10:22:20.131Z
Learnt from: olekszczepanowski
Repo: Blazity/next-cwv-monitor PR: 118
File: apps/monitor-app/src/app/api/chat/[projectId]/route.ts:4-14
Timestamp: 2026-02-27T10:22:20.131Z
Learning: In the next-cwv-monitor codebase (apps/monitor-app), the proxy.ts middleware handles authentication by redirecting unauthenticated users to /login. Routes not in PUBLIC_ROUTES array (["/login", "/api/auth", "/api/health", "/api/ingest"]) are automatically protected by this authentication layer.

Applied to files:

  • ARCHITECTURE.md
📚 Learning: 2026-02-27T10:21:06.393Z
Learnt from: olekszczepanowski
Repo: Blazity/next-cwv-monitor PR: 118
File: apps/monitor-app/src/app/server/lib/agent/tools/execute-clickhouse.ts:23-33
Timestamp: 2026-02-27T10:21:06.393Z
Learning: In the next-cwv-monitor project, the agent ClickHouse client has database-level permissions restricted to read-only operations, providing security at the database layer rather than requiring application-level SQL validation for mutation prevention.

Applied to files:

  • apps/monitor-app/.env.ci
  • apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql
  • apps/monitor-app/clickhouse/catalog.yml
  • setup.sh
  • apps/monitor-app/.env.test
  • apps/monitor-app/.env.example
🧬 Code graph analysis (13)
apps/anomaly-worker/src/env.ts (1)
apps/monitor-app/src/env.ts (1)
  • env (43-95)
apps/monitor-app/src/test/global-setup.ts (3)
apps/monitor-app/src/test/clickhouse-test-utils.ts (4)
  • CLICKHOUSE_IMAGE (9-9)
  • HTTP_PORT (10-10)
  • waitForClickHouse (32-54)
  • runClickHouseMigrations (67-89)
apps/monitor-app/scripts/run-clickhouse-migrate.mjs (1)
  • port (12-12)
apps/monitor-app/scripts/seed-demo-data.mjs (1)
  • process (6-12)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts (3)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (1)
  • AnomalyRow (85-97)
apps/monitor-app/src/app/server/lib/clickhouse/client.ts (1)
  • aiSql (120-120)
apps/monitor-app/src/app/server/lib/logger.ts (1)
  • logger (6-9)
apps/monitor-app/src/instrumentation.ts (3)
apps/monitor-app/scripts/seed-demo-data.mjs (1)
  • process (6-12)
apps/monitor-app/src/lib/provision-initial-user.ts (1)
  • provisionInitialUser (5-23)
apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts (1)
  • syncDatabaseRoles (4-13)
apps/monitor-app/src/app/server/domain/notifications/service.ts (7)
apps/monitor-app/src/env.ts (1)
  • env (43-95)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts (1)
  • getUnprocessedAnomalies (31-61)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/projects-repository.ts (1)
  • getProjectById (32-44)
apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (1)
  • aiBridge (89-89)
apps/monitor-app/src/app/server/domain/notifications/types.ts (1)
  • NotificationPayload (8-19)
apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts (1)
  • dispatcher (115-118)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts (1)
  • insertProcessedAnomaly (5-33)
apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts (2)
apps/monitor-app/src/env.ts (1)
  • env (43-95)
apps/monitor-app/src/app/server/lib/clickhouse/client.ts (1)
  • sql (118-118)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/__tests__/anomalies-repository.integration.test.ts (7)
apps/monitor-app/src/app/server/lib/clickhouse/client.ts (2)
  • sql (118-118)
  • getDirectClient (122-122)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts (1)
  • getUnprocessedAnomalies (31-61)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts (1)
  • insertProcessedAnomaly (5-33)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/projects-repository.ts (1)
  • createProject (10-30)
apps/monitor-app/src/test/global-setup.ts (1)
  • setup (9-70)
apps/monitor-app/src/test/clickhouse-test-utils.ts (1)
  • setupClickHouseContainer (161-271)
apps/monitor-app/scripts/seed-demo-data.mjs (1)
  • seedAnomalyTestPattern (587-646)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts (3)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (1)
  • InsertableProcessedAnomalyRow (99-109)
apps/monitor-app/src/app/server/lib/clickhouse/client.ts (1)
  • aiSql (120-120)
apps/monitor-app/src/app/server/lib/logger.ts (1)
  • logger (6-9)
apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts (4)
apps/monitor-app/src/env.ts (1)
  • env (43-95)
apps/monitor-app/src/app/server/domain/notifications/service.ts (2)
  • notifyNewAnomalies (12-27)
  • notifyNewAnomalies (96-96)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (2)
  • AnomalyRow (85-97)
  • ProjectRow (4-11)
apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts (1)
  • dispatcher (115-118)
apps/anomaly-worker/.env (1)
apps/monitor-app/scripts/run-clickhouse-migrate.mjs (1)
  • password (15-15)
apps/monitor-app/vitest.integration.setup.ts (1)
apps/monitor-app/scripts/seed-demo-data.mjs (1)
  • process (6-12)
apps/monitor-app/src/env.ts (1)
apps/monitor-app/scripts/seed-demo-data.mjs (1)
  • process (6-12)
apps/monitor-app/src/test/clickhouse-test-utils.ts (2)
apps/monitor-app/scripts/seed-demo-data.mjs (1)
  • process (6-12)
apps/monitor-app/scripts/run-clickhouse-migrate.mjs (1)
  • port (12-12)
🪛 dotenv-linter (4.0.0)
apps/anomaly-worker/.env.example

[warning] 5-5: [UnorderedKey] The CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_PORT key

(UnorderedKey)


[warning] 6-6: [UnorderedKey] The CLICKHOUSE_DB key should go before the CLICKHOUSE_HOST key

(UnorderedKey)


[warning] 11-11: [UnorderedKey] The AUTH_BASE_URL key should go before the LOG_LEVEL key

(UnorderedKey)

apps/monitor-app/.env.ci

[warning] 10-10: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_USER key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 11-11: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the AI_ANALYST_CLICKHOUSE_USER key

(UnorderedKey)

apps/anomaly-worker/.env

[warning] 5-5: [UnorderedKey] The CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_PORT key

(UnorderedKey)


[warning] 6-6: [UnorderedKey] The CLICKHOUSE_DB key should go before the CLICKHOUSE_HOST key

(UnorderedKey)


[warning] 7-7: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 8-8: [UnorderedKey] The BETTER_AUTH_SECRET key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 10-10: [TrailingWhitespace] Trailing whitespace detected

(TrailingWhitespace)


[warning] 11-11: [UnorderedKey] The INITIAL_USER_NAME key should go before the INITIAL_USER_PASSWORD key

(UnorderedKey)


[warning] 15-15: [UnorderedKey] The AUTH_BASE_URL key should go before the BETTER_AUTH_SECRET key

(UnorderedKey)

apps/monitor-app/.env.test

[warning] 10-10: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_USER key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 11-11: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the AI_ANALYST_CLICKHOUSE_USER key

(UnorderedKey)


[warning] 22-22: [ExtraBlankLine] Extra blank line detected

(ExtraBlankLine)

apps/monitor-app/.env.example

[warning] 10-10: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_USER key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 11-11: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the AI_ANALYST_CLICKHOUSE_USER key

(UnorderedKey)

🔇 Additional comments (29)
apps/monitor-app/src/test/performance-guardrails.test.ts (1)

237-252: LGTM!

The addition of customStart: null and customEnd: null correctly updates the test to match the expanded getDashboardData interface while preserving the existing range: "7d" behavior. This explicitly signals that no custom date overrides are used in this performance test.

docker/monitor-app.prod.Dockerfile (1)

40-41: Looks good: build-time env parity is maintained.

Adding placeholder AI analyst ClickHouse vars here is consistent with the rest of this Dockerfile’s build-time env strategy.

apps/monitor-app/src/app/server/lib/clickhouse/repositories/projects-repository.ts (1)

49-49: Nice improvement: explicit projections over wildcard selection.

This tightens the result shape and avoids accidental schema-coupling from */p.*.

Also applies to: 88-88

docker/monitor-app.dev.Dockerfile (1)

3-10: Dev compose setup properly ensures dependency availability.

While the Dockerfile itself does not install dependencies, the dev environment is correctly orchestrated via docker-compose.dev.yml, which includes a setup service that runs pnpm install before the monitor-app service starts. The monitor-app service depends on setup with condition: service_completed_successfully and mounts the workspace via volumes: - ../:/workspace, ensuring installed dependencies are available at container startup.

apps/anomaly-worker/tsconfig.json (1)

11-14: This suggestion is based on outdated TypeScript information. Since TypeScript 4.1, compilerOptions.baseUrl is not required when using compilerOptions.paths. According to the official TypeScript documentation and handbook, paths are resolved relative to the tsconfig.json file itself when baseUrl is not set. The current configuration is correct and follows modern TypeScript best practices—baseUrl should not be added just to set the directory for path resolution.

Likely an incorrect or invalid review comment.

apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql (1)

103-104: This concern is not valid. The password for ai_analyst_user is set at application startup via syncDatabaseRoles() in the bootstrap phase (before any queries execute), which runs at app initialization in instrumentation.ts. The migration creates the user without a password to keep it environment-independent and deterministic; the AI_ANALYST_CLICKHOUSE_PASSWORD is mandatory and applied immediately when the app starts via ALTER USER ... IDENTIFIED WITH sha256_password.

Likely an incorrect or invalid review comment.

docker/docker-compose.build.yml (1)

23-27: Nice addition of cwv-worker build target.

This is consistent with the existing build override pattern and keeps local image workflows aligned.

apps/anomaly-worker/src/index.ts (1)

10-10: Path alias will resolve correctly in tsx runtime.

The @monitor-app/... alias is properly configured in apps/anomaly-worker/tsconfig.json and resolves to the existing file at apps/monitor-app/src/app/server/domain/notifications/service.ts. Since tsx (^4.19.2) fully respects tsconfig path aliases, the dynamic import will work as expected.

Likely an incorrect or invalid review comment.

apps/monitor-app/src/app/server/lib/clickhouse/repositories/__tests__/anomalies-repository.integration.test.ts (3)

1-13: LGTM: Clean test setup with dynamic imports.

The approach of dynamically importing modules after container setup ensures the ClickHouse client is properly configured before use. The module-level variable declarations with proper typing provide good type safety.


15-40: LGTM: Proper test lifecycle management.

The hooks correctly handle container lifecycle, dynamic imports, and table cleanup for test isolation. The 120s timeout is appropriate for container setup operations.


42-78: LGTM: Comprehensive workflow test.

The test correctly validates the end-to-end anomaly detection and processing workflow: seeding data, detecting anomalies, marking as processed, and verifying removal from the unprocessed set. The explicit undefined check with error throw serves as both a type guard and a meaningful test failure.

apps/monitor-app/src/app/server/domain/ai-bridge/__tests__/ai-bridge-service.integration.test.ts (2)

1-37: LGTM: Well-structured test setup for AI Bridge integration.

The test correctly sets up a ClickHouse container, creates a project, seeds anomaly data, and imports the AI Bridge service dynamically. The 120s timeout is appropriate for the container and data seeding operations.


39-85: LGTM: Comprehensive AI Bridge capability tests.

The tests cover essential AI Bridge functions: schema catalog retrieval, both SQL execution modes (template literals and parameterized strings), anomaly context aggregation, and importantly, the security check ensuring unsafe SQL operations are rejected.

apps/monitor-app/vitest.integration.config.ts (2)

10-10: LGTM: Scripts alias addition.

Adding the scripts alias enables clean imports of seed utilities (like seedAnomalyTestPattern) from integration tests.


24-24: LGTM: Global setup configuration.

The globalSetup entry correctly references the global setup module that manages the shared ClickHouse test container lifecycle.

apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts (1)

1-33: LGTM: Clean repository implementation.

The repository correctly implements the insert operation with:

  • Proper use of the aiSql tag for parameterized queries
  • Sensible default for status ('new')
  • Server-side timestamp via now64(3)
  • Appropriate error handling with logging before re-throwing

The implementation aligns well with the InsertableProcessedAnomalyRow type definition.

docker/anomaly-worker.Dockerfile (3)

47-61: LGTM: Production-ready runner stage.

The runner stage follows security best practices:

  • Non-root user (cwvworker) for running the application
  • Proper file ownership with --chown
  • Minimal copied artifacts (only dist, package.json, and node_modules)
  • NODE_ENV=production set appropriately

27-27: No action needed. A comprehensive .dockerignore file is already present and properly configured at the repository root, excluding all unnecessary files (.git, node_modules, build outputs, tests, IDE files, etc.) from the Docker build context. The COPY . . instruction is safe.


1-1: Remove this comment. Node.js 24 (Krypton) has been available since May 2025 and is now Active LTS as of March 2026. The node:24-alpine Docker image is actively maintained and production-ready, with support extending to April 2028.

Likely an incorrect or invalid review comment.

apps/monitor-app/src/app/server/domain/notifications/types.ts (1)

1-24: LGTM: Well-structured notification types.

The type definitions are clear and well-suited for the notification system:

  • NotificationAction with literal 'button' type allows for future extensibility
  • NotificationPayload structure aligns well with Slack/Teams message formats
  • NotificationConfig correctly marks webhook URLs as optional, supporting partial configuration

The [key: string]: unknown index signature in metadata provides flexibility for extending the payload without breaking type safety.

apps/monitor-app/src/test/anomaly-detection.integration.test.ts (1)

40-181: LGTM: Comprehensive anomaly detection test coverage.

The test suite effectively validates key anomaly detection behaviors:

  • Deterministic anomaly_id generation (with proper hex format validation)
  • Sufficient metadata for LLM reasoning
  • Integration with processed_anomalies for state tracking
  • 1-hour gap enforcement between baseline and current windows
  • AI-driven path breakdown and metric correlation queries

The tests provide good coverage of both the detection logic and the data access patterns the AI agent will use.

docker/docker-compose.dev.yml (1)

13-87: Startup orchestration looks solid.

The one-shot setup step plus service_completed_successfully dependencies should make local boot more deterministic and avoid repeated install/migration race conditions.

Also applies to: 94-145

apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts (1)

25-85: Good coverage of core notification branches.

These cases validate the most important control-flow paths for anomaly notification behavior.

apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts (1)

31-54: Parameterized SQL usage looks good.

The conditional projectId filter is composed through aiSql fragments, keeping query construction safe and readable.

docker/docker-compose.yml (1)

88-89: Good call requiring AI analyst credentials for monitor-app.

Making these mandatory at compose level prevents silent misconfiguration at runtime.

apps/monitor-app/src/app/server/lib/clickhouse/client.ts (2)

42-44: Verify intended behavior when options.user is set but options.password is missing.

When options?.user is truthy but options.password is undefined, passwordRaw becomes undefined, resulting in a passwordless connection. If this is intentional (e.g., for users without passwords), consider adding a brief comment. Otherwise, this could silently connect without credentials when a password was expected.

Also applies to: 57-58


95-123: LGTM!

The proxy-based lazy initialization pattern is well-implemented. The proxy correctly handles both tagged template literal calls and regular function calls, and the method binding in the get trap ensures proper this context.

apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (2)

56-86: LGTM!

The getAnomalyContext method correctly handles the null project case, uses parameterized queries for SQL injection prevention, and returns a well-structured context object.


92-94: Remove unused export getDynamicAiAnalysis.

This function is never imported or used anywhere in the codebase and should be removed as dead code.

Likely an incorrect or invalid review comment.

Comment on lines +5 to +11
CLICKHOUSE_PASSWORD=secret
CLICKHOUSE_DB=cwv_monitor
AI_ANALYST_CLICKHOUSE_PASSWORD=ai_analyst_password
BETTER_AUTH_SECRET=some_long_random_string_for_auth_security
INITIAL_USER_EMAIL=user@example.com
INITIAL_USER_PASSWORD=password
INITIAL_USER_NAME=User
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not commit credential values in a tracked .env file.

Values like CLICKHOUSE_PASSWORD, AI_ANALYST_CLICKHOUSE_PASSWORD, BETTER_AUTH_SECRET, and INITIAL_USER_PASSWORD should not live in a committed runtime env file. Keep only placeholders in .env.example, and ensure .env is gitignored.

🧰 Tools
🪛 dotenv-linter (4.0.0)

[warning] 5-5: [UnorderedKey] The CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_PORT key

(UnorderedKey)


[warning] 6-6: [UnorderedKey] The CLICKHOUSE_DB key should go before the CLICKHOUSE_HOST key

(UnorderedKey)


[warning] 7-7: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 8-8: [UnorderedKey] The BETTER_AUTH_SECRET key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 10-10: [TrailingWhitespace] Trailing whitespace detected

(TrailingWhitespace)


[warning] 11-11: [UnorderedKey] The INITIAL_USER_NAME key should go before the INITIAL_USER_PASSWORD key

(UnorderedKey)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/anomaly-worker/.env` around lines 5 - 11, Remove all real credential
values from the committed .env: replace values for CLICKHOUSE_PASSWORD,
AI_ANALYST_CLICKHOUSE_PASSWORD, BETTER_AUTH_SECRET, INITIAL_USER_PASSWORD (and
any other secrets) with non-sensitive placeholders and keep only default
placeholders for INITIAL_USER_EMAIL and INITIAL_USER_NAME if needed; add .env to
.gitignore so it is not tracked; create or update a .env.example with the
placeholder keys and example values; and document in README or deployment docs
that real secrets should be provided via environment variables or a secret
manager rather than committing them.

Comment on lines +1 to +15
# ClickHouse connection details
CLICKHOUSE_HOST=localhost
CLICKHOUSE_PORT=8123
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=secret
CLICKHOUSE_DB=cwv_monitor

# Application
LOG_LEVEL=info
NODE_ENV=development
AUTH_BASE_URL=http://localhost:3000

# Anomaly Detection Webhooks (Optional)
SLACK_WEBHOOK_URL=
TEAMS_WEBHOOK_URL=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

git ls-files | grep -E "apps/anomaly-worker/\.env"

Repository: Blazity/next-cwv-monitor

Length of output: 125


🏁 Script executed:

if [ -f "apps/anomaly-worker/.env" ]; then echo "=== .env exists ==="; wc -l apps/anomaly-worker/.env; else echo ".env not found"; fi
if [ -f "apps/anomaly-worker/.env.example" ]; then echo "=== .env.example exists ==="; wc -l apps/anomaly-worker/.env.example; else echo ".env.example not found"; fi

Repository: Blazity/next-cwv-monitor

Length of output: 179


🏁 Script executed:

if [ -f "apps/anomaly-worker/.env" ] && [ -f "apps/anomaly-worker/.env.example" ]; then
  echo "=== Content of .env ==="; cat -n apps/anomaly-worker/.env
  echo
  echo "=== Content of .env.example ==="; cat -n apps/anomaly-worker/.env.example
fi

Repository: Blazity/next-cwv-monitor

Length of output: 1156


🏁 Script executed:

#!/bin/bash
set -euo pipefail

if [ ! -f "apps/anomaly-worker/.env" ] || [ ! -f "apps/anomaly-worker/.env.example" ]; then
  echo "One or both files not found"
  exit 1
fi

echo "Keys in apps/anomaly-worker/.env but missing in apps/anomaly-worker/.env.example:"
comm -23 \
  <(grep -E '^[A-Z0-9_]+=' apps/anomaly-worker/.env | cut -d= -f1 | sort) \
  <(grep -E '^[A-Z0-9_]+=' apps/anomaly-worker/.env.example | cut -d= -f1 | sort)

echo
echo "Keys in .env.example but missing in .env:"
comm -13 \
  <(grep -E '^[A-Z0-9_]+=' apps/anomaly-worker/.env | cut -d= -f1 | sort) \
  <(grep -E '^[A-Z0-9_]+=' apps/anomaly-worker/.env.example | cut -d= -f1 | sort)

Repository: Blazity/next-cwv-monitor

Length of output: 210


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Keys in .env ==="
grep -E '^[A-Z0-9_]+=' apps/anomaly-worker/.env | cut -d= -f1 | sort

echo
echo "=== Keys in .env.example ==="
grep -E '^[A-Z0-9_]+=' apps/anomaly-worker/.env.example | cut -d= -f1 | sort

echo
echo "=== Missing from .env.example (in .env but not in example) ==="
grep -E '^[A-Z0-9_]+=' apps/anomaly-worker/.env | cut -d= -f1 | sort > /tmp/env_keys.txt
grep -E '^[A-Z0-9_]+=' apps/anomaly-worker/.env.example | cut -d= -f1 | sort > /tmp/env_example_keys.txt
comm -23 /tmp/env_keys.txt /tmp/env_example_keys.txt

Repository: Blazity/next-cwv-monitor

Length of output: 666


🏁 Script executed:

# Check if missing variables are required at runtime or have defaults in the code
cd apps/anomaly-worker && (
  echo "=== Checking references to missing variables ==="
  for var in AI_ANALYST_CLICKHOUSE_PASSWORD BETTER_AUTH_SECRET INITIAL_USER_EMAIL INITIAL_USER_NAME INITIAL_USER_PASSWORD; do
    echo "Searching for $var..."
    rg "$var" --type ts --type js --type json 2>/dev/null | head -5 || echo "  Not found in code"
  done
)

Repository: Blazity/next-cwv-monitor

Length of output: 942


.env.example missing required bootstrap variables.

The template omits 5 variables present in .env: AI_ANALYST_CLICKHOUSE_PASSWORD, BETTER_AUTH_SECRET, INITIAL_USER_EMAIL, INITIAL_USER_NAME, and INITIAL_USER_PASSWORD. These are validated as required in src/env.ts with strict validation rules (min length, email format). Developers copying only .env.example during fresh setup will encounter validation errors at runtime.

Required additions to .env.example
 # ClickHouse connection details
 CLICKHOUSE_HOST=localhost
 CLICKHOUSE_PORT=8123
 CLICKHOUSE_USER=default
 CLICKHOUSE_PASSWORD=secret
 CLICKHOUSE_DB=cwv_monitor
+AI_ANALYST_CLICKHOUSE_PASSWORD=
+BETTER_AUTH_SECRET=
+INITIAL_USER_EMAIL=
+INITIAL_USER_PASSWORD=
+INITIAL_USER_NAME=
 
 # Application
 LOG_LEVEL=info
 NODE_ENV=development
 AUTH_BASE_URL=http://localhost:3000
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# ClickHouse connection details
CLICKHOUSE_HOST=localhost
CLICKHOUSE_PORT=8123
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=secret
CLICKHOUSE_DB=cwv_monitor
# Application
LOG_LEVEL=info
NODE_ENV=development
AUTH_BASE_URL=http://localhost:3000
# Anomaly Detection Webhooks (Optional)
SLACK_WEBHOOK_URL=
TEAMS_WEBHOOK_URL=
# ClickHouse connection details
CLICKHOUSE_HOST=localhost
CLICKHOUSE_PORT=8123
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=secret
CLICKHOUSE_DB=cwv_monitor
AI_ANALYST_CLICKHOUSE_PASSWORD=
BETTER_AUTH_SECRET=
INITIAL_USER_EMAIL=
INITIAL_USER_PASSWORD=
INITIAL_USER_NAME=
# Application
LOG_LEVEL=info
NODE_ENV=development
AUTH_BASE_URL=http://localhost:3000
# Anomaly Detection Webhooks (Optional)
SLACK_WEBHOOK_URL=
TEAMS_WEBHOOK_URL=
🧰 Tools
🪛 dotenv-linter (4.0.0)

[warning] 5-5: [UnorderedKey] The CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_PORT key

(UnorderedKey)


[warning] 6-6: [UnorderedKey] The CLICKHOUSE_DB key should go before the CLICKHOUSE_HOST key

(UnorderedKey)


[warning] 11-11: [UnorderedKey] The AUTH_BASE_URL key should go before the LOG_LEVEL key

(UnorderedKey)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/anomaly-worker/.env.example` around lines 1 - 15, The .env.example is
missing required bootstrap variables that are strictly validated in src/env.ts;
add the following entries to .env.example: AI_ANALYST_CLICKHOUSE_PASSWORD
(placeholder, non-empty), BETTER_AUTH_SECRET (placeholder, min-length matching
src/env.ts), INITIAL_USER_EMAIL (placeholder in valid email format),
INITIAL_USER_NAME (placeholder, non-empty), and INITIAL_USER_PASSWORD
(placeholder, min-length matching src/env.ts). Ensure the placeholder values
conform to the validation rules enforced in src/env.ts (email format for
INITIAL_USER_EMAIL and minimum lengths for secrets/passwords) so developers
copying .env.example won’t fail validation at startup.

To use the Monitor App domain in your AI application:

```typescript
import { aiBridge } from "@monitor-app/app/server/domain/ai/service";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect import path in documentation.

The import path references @monitor-app/app/server/domain/ai/service, but based on the PR changes, the actual service is located at @monitor-app/app/server/domain/ai-bridge/service. This should be updated to match the actual file structure.

📝 Proposed fix
-import { aiBridge } from "@monitor-app/app/server/domain/ai/service";
+import { aiBridge } from "@monitor-app/app/server/domain/ai-bridge/service";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { aiBridge } from "@monitor-app/app/server/domain/ai/service";
import { aiBridge } from "@monitor-app/app/server/domain/ai-bridge/service";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/AI_AGENT_GUIDE.md` at line 62, The documentation import is
pointing to the wrong module: update the import that brings in aiBridge so it
references the actual service module (use the ai-bridge service module name)
instead of the old ai/service path; modify the import line that currently
imports aiBridge to import from the ai-bridge service module so it matches the
real export and builds correctly.

Comment on lines +19 to +27
const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
if (fs.existsSync(configPath)) {
try {
const config = JSON.parse(fs.readFileSync(configPath, "utf8"));
process.env.TEST_CH_HOST = config.host;
process.env.TEST_CH_PORT = String(config.port);
} catch (error) {
console.error("Failed to read .vitest-ch-config.json", error);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate config shape before mutating process.env.

config.host/config.port are used without type checks; malformed JSON can silently poison test env values.

✅ Suggested defensive parsing
 const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
 if (fs.existsSync(configPath)) {
   try {
-    const config = JSON.parse(fs.readFileSync(configPath, "utf8"));
-    process.env.TEST_CH_HOST = config.host;
-    process.env.TEST_CH_PORT = String(config.port);
+    const config = JSON.parse(fs.readFileSync(configPath, "utf8")) as {
+      host?: unknown;
+      port?: unknown;
+    };
+
+    if (typeof config.host === "string" && config.host.length > 0) {
+      process.env.TEST_CH_HOST = config.host;
+    }
+    if (typeof config.port === "number" || typeof config.port === "string") {
+      process.env.TEST_CH_PORT = String(config.port);
+    }
   } catch (error) {
     console.error("Failed to read .vitest-ch-config.json", error);
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/vitest.integration.setup.ts` around lines 19 - 27, The
current block that reads .vitest-ch-config.json (configPath / config) writes
config.host and config.port directly into process.env.TEST_CH_HOST /
TEST_CH_PORT without validating types; update the try block to validate the
parsed config shape (ensure config is an object, config.host is a non-empty
string, and config.port is a number or numeric string) and only assign
process.env.TEST_CH_HOST and process.env.TEST_CH_PORT when validations pass,
otherwise log a clear warning and skip mutating process.env (or set safe
defaults) to avoid poisoning the test environment.

Comment on lines +40 to +43
Poller -->|"polls every hour"| Anom
Poller -->|"fetches project info"| Raw
Poller -->|"marks as notified"| Raw
Poller -->|"sends notifications"| Notifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Diagram target for notification state is inconsistent with the text.

Line 42 shows “marks as notified” going to Raw, but Line 93 documents processed_anomalies as the state store. Please align the diagram with the documented table.

Also applies to: 93-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ARCHITECTURE.md` around lines 40 - 43, The diagram currently points the
"marks as notified" (and the other notification-state arrow) to the Raw node;
update the diagram so the Poller arrows that represent notification state
mutations ("marks as notified") target the processed_anomalies state store
instead of Raw (i.e., replace Raw with processed_anomalies for those arrows),
and ensure Poller -->|"fetches project info"| still targets the correct node if
it should also be processed_anomalies; update any duplicate arrow entries (the
ones referenced around lines 42 and 93-94) so diagram labels and targets match
the documented processed_anomalies state.

Comment on lines +116 to +121
BETTER_AUTH_SECRET: ${BETTER_AUTH_SECRET:-dev-secret-not-for-production-123}
INITIAL_USER_EMAIL: ${INITIAL_USER_EMAIL:-user@example.com}
INITIAL_USER_PASSWORD: ${INITIAL_USER_PASSWORD:-password}
INITIAL_USER_NAME: ${INITIAL_USER_NAME:-User}
SLACK_WEBHOOK_URL: ${SLACK_WEBHOOK_URL:-https://localhost/placeholder}
TEAMS_WEBHOOK_URL: ${TEAMS_WEBHOOK_URL:-https://localhost/placeholder}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove insecure production defaults from cwv-worker environment.

Hardcoded fallback secrets/passwords and placeholder webhook URLs are risky in production-like deployments. Require explicit secure values instead.

🔐 Suggested hardening
-      BETTER_AUTH_SECRET: ${BETTER_AUTH_SECRET:-dev-secret-not-for-production-123}
-      INITIAL_USER_EMAIL: ${INITIAL_USER_EMAIL:-user@example.com}
-      INITIAL_USER_PASSWORD: ${INITIAL_USER_PASSWORD:-password}
-      INITIAL_USER_NAME: ${INITIAL_USER_NAME:-User}
-      SLACK_WEBHOOK_URL: ${SLACK_WEBHOOK_URL:-https://localhost/placeholder}
-      TEAMS_WEBHOOK_URL: ${TEAMS_WEBHOOK_URL:-https://localhost/placeholder}
+      BETTER_AUTH_SECRET: ${BETTER_AUTH_SECRET:?BETTER_AUTH_SECRET is required}
+      INITIAL_USER_EMAIL: ${INITIAL_USER_EMAIL:?INITIAL_USER_EMAIL is required}
+      INITIAL_USER_PASSWORD: ${INITIAL_USER_PASSWORD:?INITIAL_USER_PASSWORD is required}
+      INITIAL_USER_NAME: ${INITIAL_USER_NAME:-User}
+      SLACK_WEBHOOK_URL: ${SLACK_WEBHOOK_URL:?SLACK_WEBHOOK_URL is required}
+      TEAMS_WEBHOOK_URL: ${TEAMS_WEBHOOK_URL:?TEAMS_WEBHOOK_URL is required}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/docker-compose.yml` around lines 116 - 121, Remove the insecure
hardcoded fallbacks from the cwv-worker environment by eliminating the default
values for BETTER_AUTH_SECRET, INITIAL_USER_EMAIL, INITIAL_USER_PASSWORD,
INITIAL_USER_NAME, SLACK_WEBHOOK_URL and TEAMS_WEBHOOK_URL in the docker-compose
service definition (replace ${VAR:-default} with plain ${VAR} so the variables
must be provided), or alternatively set them to empty/null and add startup
validation that exits if BETTER_AUTH_SECRET or INITIAL_USER_PASSWORD (and
webhook URLs if required) are missing; update any deployment/docs to require
these env vars and avoid shipping secrets like
"dev-secret-not-for-production-123" or placeholder URLs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
apps/anomaly-worker/.env (1)

5-11: ⚠️ Potential issue | 🔴 Critical

Do not keep credential values in a tracked .env file.

Line 5, Line 7, Line 8, and Line 10 contain runtime secret-like values. Keep secrets out of VCS, use placeholders in .env.example, and ensure .env is gitignored.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/anomaly-worker/.env` around lines 5 - 11, Remove the hardcoded secret
values from the tracked .env (replace values for CLICKHOUSE_PASSWORD,
AI_ANALYST_CLICKHOUSE_PASSWORD, BETTER_AUTH_SECRET, INITIAL_USER_PASSWORD and
any other secret-like keys with placeholder text), add .env to .gitignore so
runtime secrets are not committed, create/update a .env.example containing the
placeholder keys and non-sensitive defaults (e.g., CLICKHOUSE_DB,
INITIAL_USER_EMAIL, INITIAL_USER_NAME), and if any real secrets were previously
committed rotate/revoke them in your infrastructure/secret store.
apps/monitor-app/src/test/clickhouse-test-utils.ts (1)

154-165: ⚠️ Potential issue | 🟠 Major

optimizeAnomalies timeout guard misses non-finite counts.

On Line 160, checking only lastCount === 0 can miss parse anomalies (NaN) and still run OPTIMIZE, which weakens failure signaling.

🛠️ Proposed fix
-    const raw = rows[0]?.cnt;
-    lastCount = typeof raw === "number" ? raw : Number(raw || 0);
+    const raw = rows[0]?.cnt;
+    lastCount = typeof raw === "number" ? raw : Number(raw);
@@
-  if (lastCount === 0) {
+  if (!Number.isFinite(lastCount) || lastCount <= 0) {
     throw new Error(
       `[optimizeAnomalies] Deadline expired: Table '${table}' is still empty (0 rows). ` +
       `Optimization aborted to prevent silent test failure.`
     );
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/test/clickhouse-test-utils.ts` around lines 154 - 165,
The timeout guard in optimizeAnomalies uses lastCount === 0 to detect failure
but misses non-finite values (NaN/Infinity) produced when parsing raw; update
the check so that after computing lastCount (from raw) you treat any non-finite
or non-positive value as failure — e.g. compute lastCount as now, then replace
the if (lastCount === 0) block with a guard like if (!Number.isFinite(lastCount)
|| lastCount === 0) { throw new Error(...same message...) } so
optimizeAnomalies, lastCount, raw and table are used to detect and abort on
parse anomalies as well as zero rows.
🧹 Nitpick comments (3)
apps/monitor-app/src/test/global-setup.ts (1)

25-27: Extract config path into a single constant.

The same config filename is recomputed in multiple places; centralizing it reduces drift risk.

♻️ Small refactor
 let container: StartedTestContainer | undefined;
+const VITEST_CH_CONFIG_PATH = path.resolve(process.cwd(), ".vitest-ch-config.json");
@@
-    const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
-    fs.writeFileSync(configPath, JSON.stringify({ host, port }));
+    fs.writeFileSync(VITEST_CH_CONFIG_PATH, JSON.stringify({ host, port }));
@@
-    const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
-    if (fs.existsSync(configPath)) {
-      fs.unlinkSync(configPath);
+    if (fs.existsSync(VITEST_CH_CONFIG_PATH)) {
+      fs.unlinkSync(VITEST_CH_CONFIG_PATH);
     }
@@
-  const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
-  if (fs.existsSync(configPath)) {
-    fs.unlinkSync(configPath);
+  if (fs.existsSync(VITEST_CH_CONFIG_PATH)) {
+    fs.unlinkSync(VITEST_CH_CONFIG_PATH);
   }

Also applies to: 47-50, 57-60

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/test/global-setup.ts` around lines 25 - 27, Extract the
repeated ".vitest-ch-config.json" resolution into a single constant (e.g.,
CONFIG_PATH or configPath) at the top of global-setup.ts and use that constant
everywhere you currently recompute the path (the places that call
path.resolve(process.cwd(), ".vitest-ch-config.json") and fs.writeFileSync /
fs.readFileSync or similar around the lines shown). Update the existing uses
such as the configPath variable in the write operation and the other occurrences
referenced (the blocks at 47-50 and 57-60) to reference the new constant so the
filename/path is defined in one place.
apps/monitor-app/src/app/server/lib/clickhouse/client.ts (1)

43-44: Decouple password override from user override.

On Line 43 and Line 57, password selection is gated by options?.user. This makes password-only overrides impossible and adds unnecessary coupling.

♻️ Proposed refactor
-    const passwordRaw = options?.user ? options.password : env.CLICKHOUSE_PASSWORD;
+    const passwordRaw = options?.password ?? env.CLICKHOUSE_PASSWORD;
@@
-  const passwordRaw = options?.user ? options.password : env.CLICKHOUSE_PASSWORD;
+  const passwordRaw = options?.password ?? env.CLICKHOUSE_PASSWORD;

Also applies to: 57-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/lib/clickhouse/client.ts` around lines 43 -
44, The password selection wrongly depends on options?.user, preventing
password-only overrides; update the logic where passwordRaw (and the second
occurrence near lines 57-58) is computed to use options?.password when present,
falling back to env.CLICKHOUSE_PASSWORD, independent of options?.user, then keep
the encodeURIComponent(passwordRaw) step (variable name password) unchanged so
password overrides work without requiring a user override.
apps/monitor-app/src/env.ts (1)

10-10: Tighten test command detection.

Line 10 uses lifecycleEvent.includes("test"), which can match unrelated script names. Prefer exact/prefix checks to avoid accidental test-mode behavior.

💡 Suggested tweak
-const isTest = process.env.NODE_ENV === "test" || lifecycleEvent.includes("test");
+const isTest =
+  process.env.NODE_ENV === "test" ||
+  lifecycleEvent === "test" ||
+  lifecycleEvent.startsWith("test:");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/env.ts` at line 10, The test-mode detection in the
isTest constant is too broad because lifecycleEvent.includes("test") can match
unrelated names; update the check in the isTest initialization to test for exact
or prefix matches instead (e.g., lifecycleEvent === "test" or
lifecycleEvent.startsWith("test:") or use a regex like /^test($|:)/) while still
keeping the NODE_ENV === "test" check; locate the const isTest declaration and
replace the includes-based condition with the exact/prefix match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/monitor-app/src/app/server/domain/ai-bridge/service.ts`:
- Around line 46-49: getSchemaCatalog currently builds the catalog path with
path.join(process.cwd(), "clickhouse", "catalog.yml") which is brittle; replace
that with a path resolved relative to this module (e.g. use
path.resolve(__dirname, '..', '..', '..', '..', 'clickhouse', 'catalog.yml') or
an equivalent import.meta.url-based resolution) inside the async
getSchemaCatalog function so the code always reads
apps/monitor-app/clickhouse/catalog.yml regardless of cwd; update the path.join
usage and keep the existing fs.readFile call and error handling.

In `@apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts`:
- Around line 76-80: Add a 5s fetch timeout to the webhook POSTs by including
signal: AbortSignal.timeout(5_000) in the fetch options for the existing fetch
calls (the one assigning response = await fetch(url, { method: 'POST',
headers..., body... }) and the second POST used for the other webhook in the
same file). Update both locations (e.g., in the Slack/Teams notification
functions within dispatcher.ts) to pass the signal property in the options so
the requests abort after 5 seconds.

In `@apps/monitor-app/src/env.ts`:
- Around line 66-69: The skipValidation flag currently includes isTest which
disables all env schema checks during tests; update skipValidation (the
expression containing process.env.SKIP_VALIDATION, isBuildCommand, and isTest)
to remove isTest so tests do not globally skip validation, and if you need to
bypass validation in specific test suites provide an explicit override (e.g.,
set SKIP_VALIDATION in test setup) rather than relying on isTest; modify the
skipValidation expression and adjust any test helpers to set SKIP_VALIDATION
when intentional.

In `@apps/monitor-app/src/test/clickhouse-test-utils.ts`:
- Around line 209-215: The test utility is passing the wrong env var name so
migrations ignore the intended DB; update the calls to runClickHouseMigrations
to use the CH_MIGRATIONS_DATABASE key (not CH_MIGRATIONS_DB) so the migration
script (run-clickhouse-migrate.mjs which reads CH_MIGRATIONS_DATABASE) receives
the test database name; change both occurrences around the
runClickHouseMigrations invocations that set CH_MIGRATIONS_DB to instead set
CH_MIGRATIONS_DATABASE with the same value (uniqueDbName).

In `@apps/monitor-app/src/test/global-setup.ts`:
- Around line 43-53: The catch block in global-setup.ts currently performs
cleanup (await container.stop(), fs.unlinkSync(...)) directly and then throws
the caught error, but if any cleanup step throws it will replace the original
setup/migration error; modify the catch to capture the original error (e.g.,
const originalError = error), then perform cleanup inside its own try/catch so
cleanup failures are logged/handled (or suppressed) without overwriting
originalError, and finally rethrow originalError; specifically wrap calls to
container.stop() and fs.unlinkSync(configPath) in a nested try/catch, reference
the existing container variable and the configPath/fs.unlinkSync usage, and
ensure the final throw uses the preserved original error.

---

Duplicate comments:
In `@apps/anomaly-worker/.env`:
- Around line 5-11: Remove the hardcoded secret values from the tracked .env
(replace values for CLICKHOUSE_PASSWORD, AI_ANALYST_CLICKHOUSE_PASSWORD,
BETTER_AUTH_SECRET, INITIAL_USER_PASSWORD and any other secret-like keys with
placeholder text), add .env to .gitignore so runtime secrets are not committed,
create/update a .env.example containing the placeholder keys and non-sensitive
defaults (e.g., CLICKHOUSE_DB, INITIAL_USER_EMAIL, INITIAL_USER_NAME), and if
any real secrets were previously committed rotate/revoke them in your
infrastructure/secret store.

In `@apps/monitor-app/src/test/clickhouse-test-utils.ts`:
- Around line 154-165: The timeout guard in optimizeAnomalies uses lastCount ===
0 to detect failure but misses non-finite values (NaN/Infinity) produced when
parsing raw; update the check so that after computing lastCount (from raw) you
treat any non-finite or non-positive value as failure — e.g. compute lastCount
as now, then replace the if (lastCount === 0) block with a guard like if
(!Number.isFinite(lastCount) || lastCount === 0) { throw new Error(...same
message...) } so optimizeAnomalies, lastCount, raw and table are used to detect
and abort on parse anomalies as well as zero rows.

---

Nitpick comments:
In `@apps/monitor-app/src/app/server/lib/clickhouse/client.ts`:
- Around line 43-44: The password selection wrongly depends on options?.user,
preventing password-only overrides; update the logic where passwordRaw (and the
second occurrence near lines 57-58) is computed to use options?.password when
present, falling back to env.CLICKHOUSE_PASSWORD, independent of options?.user,
then keep the encodeURIComponent(passwordRaw) step (variable name password)
unchanged so password overrides work without requiring a user override.

In `@apps/monitor-app/src/env.ts`:
- Line 10: The test-mode detection in the isTest constant is too broad because
lifecycleEvent.includes("test") can match unrelated names; update the check in
the isTest initialization to test for exact or prefix matches instead (e.g.,
lifecycleEvent === "test" or lifecycleEvent.startsWith("test:") or use a regex
like /^test($|:)/) while still keeping the NODE_ENV === "test" check; locate the
const isTest declaration and replace the includes-based condition with the
exact/prefix match.

In `@apps/monitor-app/src/test/global-setup.ts`:
- Around line 25-27: Extract the repeated ".vitest-ch-config.json" resolution
into a single constant (e.g., CONFIG_PATH or configPath) at the top of
global-setup.ts and use that constant everywhere you currently recompute the
path (the places that call path.resolve(process.cwd(), ".vitest-ch-config.json")
and fs.writeFileSync / fs.readFileSync or similar around the lines shown).
Update the existing uses such as the configPath variable in the write operation
and the other occurrences referenced (the blocks at 47-50 and 57-60) to
reference the new constant so the filename/path is defined in one place.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3c083a and 23683d0.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (18)
  • apps/anomaly-worker/.env
  • apps/anomaly-worker/.env.example
  • apps/anomaly-worker/.gitignore
  • apps/anomaly-worker/src/index.ts
  • apps/monitor-app/clickhouse/catalog.yml
  • apps/monitor-app/package.json
  • apps/monitor-app/scripts/seed-demo-data.mjs
  • apps/monitor-app/src/app/server/domain/ai-bridge/service.ts
  • apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts
  • apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts
  • apps/monitor-app/src/app/server/domain/notifications/service.ts
  • apps/monitor-app/src/app/server/lib/clickhouse/client.ts
  • apps/monitor-app/src/env.ts
  • apps/monitor-app/src/instrumentation.ts
  • apps/monitor-app/src/test/clickhouse-test-utils.ts
  • apps/monitor-app/src/test/global-setup.ts
  • docker/docker-compose.dev.yml
  • docker/monitor-app.dev.Dockerfile
✅ Files skipped from review due to trivial changes (1)
  • apps/anomaly-worker/.gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/monitor-app/package.json
  • apps/monitor-app/scripts/seed-demo-data.mjs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: check_pull_request
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-27T10:21:06.393Z
Learnt from: olekszczepanowski
Repo: Blazity/next-cwv-monitor PR: 118
File: apps/monitor-app/src/app/server/lib/agent/tools/execute-clickhouse.ts:23-33
Timestamp: 2026-02-27T10:21:06.393Z
Learning: In the next-cwv-monitor project, the agent ClickHouse client has database-level permissions restricted to read-only operations, providing security at the database layer rather than requiring application-level SQL validation for mutation prevention.

Applied to files:

  • apps/monitor-app/clickhouse/catalog.yml
🧬 Code graph analysis (9)
apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts (3)
apps/monitor-app/src/app/server/domain/notifications/service.ts (2)
  • notifyNewAnomalies (15-49)
  • notifyNewAnomalies (118-118)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (1)
  • AnomalyRow (85-97)
apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts (1)
  • dispatcher (134-137)
apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (6)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (2)
  • AnomalyRow (85-97)
  • ProjectRow (4-11)
apps/monitor-app/src/env.ts (1)
  • env (41-93)
apps/monitor-app/src/app/server/lib/clickhouse/client.ts (2)
  • aiSql (120-120)
  • getDirectAiClient (123-123)
apps/monitor-app/src/app/server/lib/logger.ts (1)
  • logger (6-9)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts (1)
  • getAnomalyById (63-86)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/projects-repository.ts (1)
  • getProjectById (32-44)
apps/monitor-app/src/app/server/domain/notifications/service.ts (8)
apps/monitor-app/src/env.ts (1)
  • env (41-93)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts (1)
  • getUnprocessedAnomalies (31-61)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (1)
  • AnomalyRow (85-97)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/projects-repository.ts (1)
  • getProjectById (32-44)
apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (1)
  • aiBridge (89-89)
apps/monitor-app/src/app/server/domain/notifications/types.ts (1)
  • NotificationPayload (8-19)
apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts (1)
  • dispatcher (134-137)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts (1)
  • insertProcessedAnomaly (5-33)
apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts (3)
apps/monitor-app/src/app/server/domain/notifications/types.ts (2)
  • NotificationConfig (21-24)
  • NotificationPayload (8-19)
apps/monitor-app/src/app/server/lib/logger.ts (1)
  • logger (6-9)
apps/monitor-app/src/env.ts (1)
  • env (41-93)
apps/monitor-app/src/test/clickhouse-test-utils.ts (1)
apps/monitor-app/scripts/run-clickhouse-migrate.mjs (1)
  • port (12-12)
apps/monitor-app/src/env.ts (1)
apps/monitor-app/scripts/seed-demo-data.mjs (1)
  • process (6-12)
apps/monitor-app/src/app/server/lib/clickhouse/client.ts (1)
apps/monitor-app/src/env.ts (1)
  • env (41-93)
apps/monitor-app/src/instrumentation.ts (3)
apps/monitor-app/scripts/seed-demo-data.mjs (1)
  • process (6-12)
apps/monitor-app/src/lib/provision-initial-user.ts (1)
  • provisionInitialUser (5-23)
apps/monitor-app/src/app/server/lib/clickhouse/bootstrap.ts (1)
  • syncDatabaseRoles (4-13)
apps/anomaly-worker/.env (1)
apps/monitor-app/scripts/run-clickhouse-migrate.mjs (1)
  • password (15-15)
🪛 dotenv-linter (4.0.0)
apps/anomaly-worker/.env.example

[warning] 5-5: [UnorderedKey] The CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_PORT key

(UnorderedKey)


[warning] 6-6: [UnorderedKey] The CLICKHOUSE_DB key should go before the CLICKHOUSE_HOST key

(UnorderedKey)


[warning] 7-7: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 8-8: [UnorderedKey] The BETTER_AUTH_SECRET key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 11-11: [UnorderedKey] The INITIAL_USER_NAME key should go before the INITIAL_USER_PASSWORD key

(UnorderedKey)


[warning] 16-16: [UnorderedKey] The AUTH_BASE_URL key should go before the LOG_LEVEL key

(UnorderedKey)

apps/anomaly-worker/.env

[warning] 5-5: [UnorderedKey] The CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_PORT key

(UnorderedKey)


[warning] 6-6: [UnorderedKey] The CLICKHOUSE_DB key should go before the CLICKHOUSE_HOST key

(UnorderedKey)


[warning] 7-7: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 8-8: [UnorderedKey] The BETTER_AUTH_SECRET key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 11-11: [UnorderedKey] The INITIAL_USER_NAME key should go before the INITIAL_USER_PASSWORD key

(UnorderedKey)


[warning] 15-15: [UnorderedKey] The AUTH_BASE_URL key should go before the BETTER_AUTH_SECRET key

(UnorderedKey)

🔇 Additional comments (11)
apps/monitor-app/src/instrumentation.ts (1)

3-13: Startup resilience improvement looks good.

The Node-only guard plus the try/catch around syncDatabaseRoles() cleanly prevents non-Node side effects and avoids blocking startup on role-sync failures.

apps/monitor-app/src/test/global-setup.ts (2)

8-42: Setup flow is robust and CI-friendly.

Container startup, readiness wait, config emission, and migration bootstrap are well sequenced for deterministic integration tests.


56-66: Teardown is clean and idempotent.

Good cleanup ordering and defensive checks for both file and container state.

docker/docker-compose.dev.yml (1)

63-69: Good dev startup sequencing.

The dependency chain on setup completion (Line 67, Line 85, Line 114, Line 143) makes the local flow more deterministic and avoids repeated per-service installs.

Also applies to: 85-87, 114-115, 138-144

docker/monitor-app.dev.Dockerfile (1)

3-3: Looks consistent with the new workspace-based dev flow.

WORKDIR and the filtered dev command align with the compose changes and reduce duplicated install logic.

Also applies to: 10-10

apps/anomaly-worker/.env.example (1)

7-11: Template coverage looks good.

Required bootstrap keys and optional webhook keys are present, which makes local setup clearer and reduces missing-variable churn.

Also applies to: 19-20

apps/anomaly-worker/src/index.ts (1)

7-57: Overlap prevention and graceful shutdown flow look correct.

The run guard, shutdown flag, task stop, wait loop, and forced-timeout fallback form a solid worker lifecycle for scheduled jobs.

apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts (1)

33-85: Good integration-path coverage for notification flow.

The three scenarios validate gating, happy path dispatch, and empty-result no-op behavior with appropriate interaction checks.

apps/monitor-app/clickhouse/catalog.yml (1)

65-77: processed_anomalies catalog section is now aligned with anomaly notification writes.

The documented columns and types now reflect the notification-processing data shape, reducing schema-context drift for AI-assisted queries.

apps/monitor-app/src/app/server/domain/notifications/service.ts (1)

11-49: Batch-based concurrency control is a good safeguard.

The fixed-size batching and delay strategy provides a safer load profile for AI and webhook dependencies during anomaly spikes.

apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (1)

40-42: Good call keeping SQL error logs query-free.

Logging only the error object here reduces sensitive-query exposure risk in production logs.

Comment on lines +46 to +49
async getSchemaCatalog(): Promise<string> {
try {
const catalogPath = path.join(process.cwd(), "clickhouse", "catalog.yml");
return await fs.readFile(catalogPath, "utf8");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, verify the file exists and contains the code snippet
if [ -f "apps/monitor-app/src/app/server/domain/ai-bridge/service.ts" ]; then
  echo "=== File Found ==="
  wc -l "apps/monitor-app/src/app/server/domain/ai-bridge/service.ts"
  echo ""
  echo "=== Lines 46-49 context (40-55) ==="
  sed -n '40,55p' "apps/monitor-app/src/app/server/domain/ai-bridge/service.ts" | cat -n
else
  echo "File not found"
  find . -name "service.ts" -path "*/ai-bridge/*" 2>/dev/null | head -20
fi

Repository: Blazity/next-cwv-monitor

Length of output: 756


🏁 Script executed:

# Check if catalog.yml exists at claimed location
echo "=== Checking catalog.yml locations ==="
if [ -f "apps/monitor-app/clickhouse/catalog.yml" ]; then
  echo "FOUND: apps/monitor-app/clickhouse/catalog.yml"
  wc -l "apps/monitor-app/clickhouse/catalog.yml"
else
  echo "MISSING: apps/monitor-app/clickhouse/catalog.yml"
fi

if [ -f "clickhouse/catalog.yml" ]; then
  echo "FOUND: clickhouse/catalog.yml"
else
  echo "MISSING: clickhouse/catalog.yml"
fi

echo ""
echo "=== Searching for all catalog.yml files ==="
fd --type f "catalog.yml" 2>/dev/null || find . -name "catalog.yml" -type f 2>/dev/null

Repository: Blazity/next-cwv-monitor

Length of output: 314


🏁 Script executed:

# Check the module type and imports in the service file
if [ -f "apps/monitor-app/src/app/server/domain/ai-bridge/service.ts" ]; then
  echo "=== Checking imports and module type ==="
  head -20 "apps/monitor-app/src/app/server/domain/ai-bridge/service.ts"
fi

Repository: Blazity/next-cwv-monitor

Length of output: 1065


Fix brittle process.cwd()-based catalog lookup; file is located under app directory.

At line 48, path.join(process.cwd(), "clickhouse", "catalog.yml") assumes the process starts from apps/monitor-app directory. The actual catalog resides at apps/monitor-app/clickhouse/catalog.yml, so any invocation from the repository root or different cwd will fail to locate the file. The catch block masks this with a fallback, but the catalog remains unavailable.

🔧 Recommended fix
 import fs from "node:fs/promises";
 import path from "node:path";
+import { fileURLToPath } from "node:url";
 import { AnomalyRow, ProjectRow } from "@/app/server/lib/clickhouse/schema";
 import { env } from "@/env";
 import { aiSql, getDirectAiClient } from "@/app/server/lib/clickhouse/client";
 import { getAnomalyById } from "@/app/server/lib/clickhouse/repositories/anomalies-repository";
 import { getProjectById } from "@/app/server/lib/clickhouse/repositories/projects-repository";
 import { logger } from "@/app/server/lib/logger";

 export class AiBridgeService {
+  private readonly catalogPath = path.resolve(
+    path.dirname(fileURLToPath(import.meta.url)),
+    "../../../../../clickhouse/catalog.yml",
+  );
+
   async getSchemaCatalog(): Promise<string> {
     try {
-      const catalogPath = path.join(process.cwd(), "clickhouse", "catalog.yml");
-      return await fs.readFile(catalogPath, "utf8");
+      return await fs.readFile(this.catalogPath, "utf8");
     } catch (error) {
       logger.error({ err: error }, "Failed to read schema catalog");
       return "# Schema catalog unavailable";
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/domain/ai-bridge/service.ts` around lines 46
- 49, getSchemaCatalog currently builds the catalog path with
path.join(process.cwd(), "clickhouse", "catalog.yml") which is brittle; replace
that with a path resolved relative to this module (e.g. use
path.resolve(__dirname, '..', '..', '..', '..', 'clickhouse', 'catalog.yml') or
an equivalent import.meta.url-based resolution) inside the async
getSchemaCatalog function so the code always reads
apps/monitor-app/clickhouse/catalog.yml regardless of cwd; update the path.join
usage and keep the existing fs.readFile call and error handling.

Comment on lines +66 to +69
skipValidation:
process.env.SKIP_VALIDATION === "true" ||
isBuildCommand ||
isTest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid globally disabling env validation in test mode.

On Line 66-69, isTest in skipValidation suppresses all schema checks, which can hide missing required vars and produce late, harder-to-debug runtime failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/env.ts` around lines 66 - 69, The skipValidation flag
currently includes isTest which disables all env schema checks during tests;
update skipValidation (the expression containing process.env.SKIP_VALIDATION,
isBuildCommand, and isTest) to remove isTest so tests do not globally skip
validation, and if you need to bypass validation in specific test suites provide
an explicit override (e.g., set SKIP_VALIDATION in test setup) rather than
relying on isTest; modify the skipValidation expression and adjust any test
helpers to set SKIP_VALIDATION when intentional.

Comment on lines +209 to +215
await runClickHouseMigrations({
CH_MIGRATIONS_HOST: host,
CH_MIGRATIONS_PORT: String(port),
CH_MIGRATIONS_DB: uniqueDbName,
CH_MIGRATIONS_USER: cfg.username,
CH_MIGRATIONS_PASSWORD: cfg.password,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Migration env keys referenced in script ==="
rg -n "CH_MIGRATIONS_(DB|DATABASE|HOST|PORT|USER|PASSWORD)" apps/monitor-app/scripts/run-clickhouse-migrate.mjs -C2

echo
echo "=== Keys passed from test utility ==="
rg -n "CH_MIGRATIONS_(DB|DATABASE|HOST|PORT|USER|PASSWORD)" apps/monitor-app/src/test/clickhouse-test-utils.ts -C2

echo
echo "=== Keys used in compose ==="
rg -n "CH_MIGRATIONS_(DB|DATABASE|HOST|PORT|USER|PASSWORD)" docker/docker-compose.dev.yml -C2

Repository: Blazity/next-cwv-monitor

Length of output: 1728


Use CH_MIGRATIONS_DATABASE instead of CH_MIGRATIONS_DB in test utility calls.

The migration script (line 13 of run-clickhouse-migrate.mjs) reads CH_MIGRATIONS_DATABASE, but lines 212 and 272 pass CH_MIGRATIONS_DB. This causes the test database name to be ignored, with migrations running against the default database instead.

Update both calls (lines 209–215 and 269–275) to use CH_MIGRATIONS_DATABASE as the key.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/test/clickhouse-test-utils.ts` around lines 209 - 215,
The test utility is passing the wrong env var name so migrations ignore the
intended DB; update the calls to runClickHouseMigrations to use the
CH_MIGRATIONS_DATABASE key (not CH_MIGRATIONS_DB) so the migration script
(run-clickhouse-migrate.mjs which reads CH_MIGRATIONS_DATABASE) receives the
test database name; change both occurrences around the runClickHouseMigrations
invocations that set CH_MIGRATIONS_DB to instead set CH_MIGRATIONS_DATABASE with
the same value (uniqueDbName).

Comment on lines +43 to +53
} catch (error) {
await container.stop();
container = undefined;

const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
if (fs.existsSync(configPath)) {
fs.unlinkSync(configPath);
}

throw error;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the original setup error if cleanup fails.

If cleanup throws at Line 44, it replaces the real setup/migration failure, which makes triage harder and can hide root cause.

🔧 Suggested hardening
   } catch (error) {
-    await container.stop();
-    container = undefined;
-    
     const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
-    if (fs.existsSync(configPath)) {
-      fs.unlinkSync(configPath);
-    }
-
-    throw error;
+    const originalError = error;
+    try {
+      if (container) {
+        try {
+          await container.stop();
+        } catch {
+          // best-effort cleanup; preserve original setup error
+        }
+      }
+    } finally {
+      container = undefined;
+      if (fs.existsSync(configPath)) {
+        fs.unlinkSync(configPath);
+      }
+    }
+    throw originalError;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (error) {
await container.stop();
container = undefined;
const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
if (fs.existsSync(configPath)) {
fs.unlinkSync(configPath);
}
throw error;
}
} catch (error) {
const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
const originalError = error;
try {
if (container) {
try {
await container.stop();
} catch {
// best-effort cleanup; preserve original setup error
}
}
} finally {
container = undefined;
if (fs.existsSync(configPath)) {
fs.unlinkSync(configPath);
}
}
throw originalError;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/test/global-setup.ts` around lines 43 - 53, The catch
block in global-setup.ts currently performs cleanup (await container.stop(),
fs.unlinkSync(...)) directly and then throws the caught error, but if any
cleanup step throws it will replace the original setup/migration error; modify
the catch to capture the original error (e.g., const originalError = error),
then perform cleanup inside its own try/catch so cleanup failures are
logged/handled (or suppressed) without overwriting originalError, and finally
rethrow originalError; specifically wrap calls to container.stop() and
fs.unlinkSync(configPath) in a nested try/catch, reference the existing
container variable and the configPath/fs.unlinkSync usage, and ensure the final
throw uses the preserved original error.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (4)
apps/anomaly-worker/.env.example (1)

7-11: ⚠️ Potential issue | 🟠 Major

Use validation-safe placeholder values for required bootstrap vars.

These keys are present now, but empty defaults still make fresh bootstrap fail if .env.example is copied directly. Please provide non-empty, validation-safe placeholders.

Suggested patch
-AI_ANALYST_CLICKHOUSE_PASSWORD=
-BETTER_AUTH_SECRET=
-INITIAL_USER_EMAIL=
-INITIAL_USER_PASSWORD=
-INITIAL_USER_NAME=
+AI_ANALYST_CLICKHOUSE_PASSWORD=change-me-clickhouse-password
+BETTER_AUTH_SECRET=replace-with-at-least-32-characters-secret
+INITIAL_USER_EMAIL=admin@example.com
+INITIAL_USER_PASSWORD=ChangeMe123!
+INITIAL_USER_NAME=Admin
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/anomaly-worker/.env.example` around lines 7 - 11, The .env.example file
contains empty bootstrap variables which cause fresh setups to fail; replace the
empty values for AI_ANALYST_CLICKHOUSE_PASSWORD, BETTER_AUTH_SECRET,
INITIAL_USER_EMAIL, INITIAL_USER_PASSWORD, and INITIAL_USER_NAME with
validation-safe placeholder strings (e.g., non-empty, obviously temporary values
such as "changeme" for secrets and "user@example.com" for email, and a
placeholder password like "ChangeMe123!" and name like "Example User") so that
copying .env.example produces valid, non-empty defaults that pass basic
validation checks without exposing real credentials.
apps/monitor-app/src/test/clickhouse-test-utils.ts (1)

209-215: ⚠️ Potential issue | 🟠 Major

Use CH_MIGRATIONS_DATABASE for migration DB selection.

Line 212 still passes CH_MIGRATIONS_DB, while the migration script expects CH_MIGRATIONS_DATABASE. This can cause migrations to run against the wrong/default DB. The same mismatch exists again at Line 272.

Suggested fix
   await runClickHouseMigrations({
     CH_MIGRATIONS_HOST: host,
     CH_MIGRATIONS_PORT: String(port),
-    CH_MIGRATIONS_DB: uniqueDbName,
+    CH_MIGRATIONS_DATABASE: uniqueDbName,
     CH_MIGRATIONS_USER: cfg.username,
     CH_MIGRATIONS_PASSWORD: cfg.password,
   });

   await runClickHouseMigrations({
     CH_MIGRATIONS_HOST: host,
     CH_MIGRATIONS_PORT: String(port),
-    CH_MIGRATIONS_DB: cfg.database,
+    CH_MIGRATIONS_DATABASE: cfg.database,
     CH_MIGRATIONS_USER: cfg.username,
     CH_MIGRATIONS_PASSWORD: cfg.password,
   });

Also applies to: 269-275

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/test/clickhouse-test-utils.ts` around lines 209 - 215,
The migration invocation is passing CH_MIGRATIONS_DB but the migration runner
expects CH_MIGRATIONS_DATABASE; update both calls to runClickHouseMigrations so
the env object uses CH_MIGRATIONS_DATABASE: replace the CH_MIGRATIONS_DB key
with CH_MIGRATIONS_DATABASE while keeping the same values (host, port,
uniqueDbName, cfg.username, cfg.password) so runClickHouseMigrations receives
the correct DB name; do the same replacement for the second invocation that
currently mirrors the first.
apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts (1)

76-80: ⚠️ Potential issue | 🟠 Major

Add explicit timeouts to webhook POSTs.

Both outbound webhook calls can hang indefinitely; that can stall batch progression under network issues.

🛠️ Proposed fix
     const response = await fetch(url, {
       method: 'POST',
       headers: { 'Content-Type': 'application/json' },
       body: JSON.stringify({ blocks }),
+      signal: AbortSignal.timeout(5_000),
     });
     const response = await fetch(url, {
       method: 'POST',
       headers: { 'Content-Type': 'application/json' },
       body: JSON.stringify({
         type: "message",
         attachments: [{
           contentType: "application/vnd.microsoft.card.adaptive",
           content: adaptiveCard
         }]
       }),
+      signal: AbortSignal.timeout(5_000),
     });

Also applies to: 110-120

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts` around
lines 76 - 80, The webhook POSTs using fetch in dispatcher.ts (the fetch call
that creates `response` with body JSON.stringify({ blocks }) and the similar
call around lines 110-120) need explicit timeouts: wrap each fetch with an
AbortController, pass controller.signal to fetch, start a setTimeout that calls
controller.abort() after a configurable timeout (e.g., WEBHOOK_TIMEOUT_MS), and
clear the timeout when fetch resolves or rejects; also handle the
AbortError/aborted case and surface a clear error so batch progression can fail
fast instead of hanging.
apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (1)

48-49: ⚠️ Potential issue | 🟠 Major

Resolve catalog path relative to module, not process CWD.

Using process.cwd() here is brittle and can miss apps/monitor-app/clickhouse/catalog.yml depending on startup directory; fallback then hides the failure.

🛠️ Proposed fix
 import fs from "node:fs/promises";
 import path from "node:path";
+import { fileURLToPath } from "node:url";
@@
   async getSchemaCatalog(): Promise<string> {
     try {
-      const catalogPath = path.join(process.cwd(), "clickhouse", "catalog.yml");
+      const catalogPath = path.resolve(
+        path.dirname(fileURLToPath(import.meta.url)),
+        "../../../../../clickhouse/catalog.yml",
+      );
       return await fs.readFile(catalogPath, "utf8");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/domain/ai-bridge/service.ts` around lines 48
- 49, The code builds catalogPath using process.cwd(), which is brittle; change
the resolution to be relative to the module instead (use the module
directory—e.g., __dirname or fileURLToPath(import.meta.url> + path.dirname—to
compose the clickhouse/catalog.yml path) when constructing catalogPath, and then
call fs.readFile on that path (catalogPath and fs.readFile are the symbols to
edit); also ensure you do not swallow errors from fs.readFile so failures bubble
up (remove any fallback that hides file-not-found) so missing file errors are
visible.
🧹 Nitpick comments (2)
apps/monitor-app/src/test/global-setup.ts (1)

56-65: Consider making teardown resilient to stop failures.

If container.stop() throws during teardown (e.g., container already exited), the teardown fails and may leave stale state. This is less critical than the setup case but could still cause flaky CI runs.

♻️ Optional hardening
 export async function teardown() {  
   const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
   if (fs.existsSync(configPath)) {
     fs.unlinkSync(configPath);
   }

   if (container) {
-    await container.stop();
+    try {
+      await container.stop();
+    } catch {
+      // container may have already exited
+    }
     container = undefined;
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/test/global-setup.ts` around lines 56 - 65, The teardown
function should handle failures from container.stop() so a thrown error doesn't
abort teardown; wrap the await container.stop() call in a try/catch, log or
ignore the error, and ensure container is set to undefined regardless (i.e., in
the catch or finally) so stale state is not left behind — update the teardown
function around the container.stop() call (referencing teardown, container, and
container.stop()) to be resilient to stop failures.
apps/monitor-app/src/test/clickhouse-test-utils.ts (1)

174-178: Narrow the returned type instead of force-casting to StartedTestContainer.

The mock object at lines 217-232 implements only a subset of StartedTestContainer methods (stop, getHost, getMappedPort, exec). While tests currently use only these methods, the force-cast masks the incomplete implementation. Create a local interface like ClickHouseContainerHandle with just these methods to improve type safety and prevent future misuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/test/clickhouse-test-utils.ts` around lines 174 - 178,
The exported function setupClickHouseContainer currently returns Promise<{
container: StartedTestContainer; host: string; port: number; }>, but the mock
only implements a subset of StartedTestContainer; define a local interface
ClickHouseContainerHandle describing the used methods (stop, getHost,
getMappedPort, exec) and change the container return type to
ClickHouseContainerHandle so the mock no longer needs a force-cast; update the
function signature and any related mock construction to use
ClickHouseContainerHandle instead of StartedTestContainer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/monitor-app/scripts/seed-demo-data.mjs`:
- Around line 617-620: The computed intervalMs can become 0 when minutesPastHour
is 0; update the intervalMs calculation (the variable intervalMs that uses
minutesPastHour) to ensure a non‑zero minimum by wrapping the Math.floor(...)
result with Math.max(1, ... ) (or similar min value) so timestamps still spread;
keep the existing conditional for minutesPastHour > 30 returning 60_000 and only
apply the non‑zero guard to the Math.floor branch.

In `@apps/monitor-app/src/app/server/domain/ai-bridge/service.ts`:
- Around line 68-78: The trend query in service.ts uses {projectId:String} for a
UUID column and omits device_type, so change the SQL parameters to use
{projectId:UUID} and add an AND device_type = {deviceType:String} predicate to
the WHERE clause, then include deviceType: anomaly.device_type in the parameter
object (keep projectId: anomaly.project_id but with UUID typing). Locate the
query building code that references anomaly (the block with
projectId/metricName/route) and update the param type and param list accordingly
so the trend is filtered by the anomaly's device_type and project_id is passed
as UUID.

In `@apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts`:
- Around line 96-101: The Adaptive Card element types are incorrectly cased;
update the elements created in the notification payload so "textBlock" becomes
"TextBlock" and "factSet" becomes "FactSet" where the card is assembled (the
object using payload.title, payload.text, and payload.fields?.map(...)); keep
property values like weight: "bolder" and size: "medium" as-is, and ensure the
mapped facts remain { title: f.title, value: f.value } to produce a valid
Adaptive Card.

In `@apps/monitor-app/src/app/server/domain/notifications/service.ts`:
- Around line 96-107: The code unconditionally marks anomalies as 'notified'
after calling dispatcher.send(notification), but dispatcher.send returns
per-channel delivery results that may indicate all channels failed; update the
flow in the function using dispatcher.send and insertProcessedAnomaly so you
inspect the returned per-channel results, determine if any channel succeeded,
and set the inserted record's status to 'notified' only when at least one
delivery succeeded, otherwise set status to 'failed' (or 'pending_retry' per
project policy); also include failure details from the dispatcher result in the
logger.info/logger.error and in the inserted metadata so failed deliveries can
be retried or diagnosed.

---

Duplicate comments:
In `@apps/anomaly-worker/.env.example`:
- Around line 7-11: The .env.example file contains empty bootstrap variables
which cause fresh setups to fail; replace the empty values for
AI_ANALYST_CLICKHOUSE_PASSWORD, BETTER_AUTH_SECRET, INITIAL_USER_EMAIL,
INITIAL_USER_PASSWORD, and INITIAL_USER_NAME with validation-safe placeholder
strings (e.g., non-empty, obviously temporary values such as "changeme" for
secrets and "user@example.com" for email, and a placeholder password like
"ChangeMe123!" and name like "Example User") so that copying .env.example
produces valid, non-empty defaults that pass basic validation checks without
exposing real credentials.

In `@apps/monitor-app/src/app/server/domain/ai-bridge/service.ts`:
- Around line 48-49: The code builds catalogPath using process.cwd(), which is
brittle; change the resolution to be relative to the module instead (use the
module directory—e.g., __dirname or fileURLToPath(import.meta.url> +
path.dirname—to compose the clickhouse/catalog.yml path) when constructing
catalogPath, and then call fs.readFile on that path (catalogPath and fs.readFile
are the symbols to edit); also ensure you do not swallow errors from fs.readFile
so failures bubble up (remove any fallback that hides file-not-found) so missing
file errors are visible.

In `@apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts`:
- Around line 76-80: The webhook POSTs using fetch in dispatcher.ts (the fetch
call that creates `response` with body JSON.stringify({ blocks }) and the
similar call around lines 110-120) need explicit timeouts: wrap each fetch with
an AbortController, pass controller.signal to fetch, start a setTimeout that
calls controller.abort() after a configurable timeout (e.g.,
WEBHOOK_TIMEOUT_MS), and clear the timeout when fetch resolves or rejects; also
handle the AbortError/aborted case and surface a clear error so batch
progression can fail fast instead of hanging.

In `@apps/monitor-app/src/test/clickhouse-test-utils.ts`:
- Around line 209-215: The migration invocation is passing CH_MIGRATIONS_DB but
the migration runner expects CH_MIGRATIONS_DATABASE; update both calls to
runClickHouseMigrations so the env object uses CH_MIGRATIONS_DATABASE: replace
the CH_MIGRATIONS_DB key with CH_MIGRATIONS_DATABASE while keeping the same
values (host, port, uniqueDbName, cfg.username, cfg.password) so
runClickHouseMigrations receives the correct DB name; do the same replacement
for the second invocation that currently mirrors the first.

---

Nitpick comments:
In `@apps/monitor-app/src/test/clickhouse-test-utils.ts`:
- Around line 174-178: The exported function setupClickHouseContainer currently
returns Promise<{ container: StartedTestContainer; host: string; port: number;
}>, but the mock only implements a subset of StartedTestContainer; define a
local interface ClickHouseContainerHandle describing the used methods (stop,
getHost, getMappedPort, exec) and change the container return type to
ClickHouseContainerHandle so the mock no longer needs a force-cast; update the
function signature and any related mock construction to use
ClickHouseContainerHandle instead of StartedTestContainer.

In `@apps/monitor-app/src/test/global-setup.ts`:
- Around line 56-65: The teardown function should handle failures from
container.stop() so a thrown error doesn't abort teardown; wrap the await
container.stop() call in a try/catch, log or ignore the error, and ensure
container is set to undefined regardless (i.e., in the catch or finally) so
stale state is not left behind — update the teardown function around the
container.stop() call (referencing teardown, container, and container.stop()) to
be resilient to stop failures.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23683d0 and 2fd942a.

📒 Files selected for processing (12)
  • apps/anomaly-worker/.env.example
  • apps/anomaly-worker/.gitignore
  • apps/anomaly-worker/src/index.ts
  • apps/monitor-app/clickhouse/catalog.yml
  • apps/monitor-app/scripts/seed-demo-data.mjs
  • apps/monitor-app/src/app/server/domain/ai-bridge/service.ts
  • apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts
  • apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts
  • apps/monitor-app/src/app/server/domain/notifications/service.ts
  • apps/monitor-app/src/instrumentation.ts
  • apps/monitor-app/src/test/clickhouse-test-utils.ts
  • apps/monitor-app/src/test/global-setup.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • apps/monitor-app/src/app/server/domain/notifications/tests/notifications-service.integration.test.ts
  • apps/monitor-app/src/instrumentation.ts
  • apps/anomaly-worker/src/index.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: check_pull_request
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: olekszczepanowski
Repo: Blazity/next-cwv-monitor PR: 118
File: apps/monitor-app/src/app/server/lib/agent/tools/execute-clickhouse.ts:23-33
Timestamp: 2026-02-27T10:21:06.393Z
Learning: In the next-cwv-monitor project, the agent ClickHouse client has database-level permissions restricted to read-only operations, providing security at the database layer rather than requiring application-level SQL validation for mutation prevention.
📚 Learning: 2026-02-27T10:21:06.393Z
Learnt from: olekszczepanowski
Repo: Blazity/next-cwv-monitor PR: 118
File: apps/monitor-app/src/app/server/lib/agent/tools/execute-clickhouse.ts:23-33
Timestamp: 2026-02-27T10:21:06.393Z
Learning: In the next-cwv-monitor project, the agent ClickHouse client has database-level permissions restricted to read-only operations, providing security at the database layer rather than requiring application-level SQL validation for mutation prevention.

Applied to files:

  • apps/monitor-app/clickhouse/catalog.yml
🧬 Code graph analysis (2)
apps/monitor-app/src/test/clickhouse-test-utils.ts (1)
apps/monitor-app/scripts/run-clickhouse-migrate.mjs (1)
  • port (12-12)
apps/monitor-app/src/test/global-setup.ts (2)
apps/monitor-app/src/test/clickhouse-test-utils.ts (4)
  • CLICKHOUSE_IMAGE (9-9)
  • HTTP_PORT (10-10)
  • waitForClickHouse (32-54)
  • runClickHouseMigrations (67-89)
apps/monitor-app/scripts/run-clickhouse-migrate.mjs (1)
  • port (12-12)
🪛 dotenv-linter (4.0.0)
apps/anomaly-worker/.env.example

[warning] 5-5: [UnorderedKey] The CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_PORT key

(UnorderedKey)


[warning] 6-6: [UnorderedKey] The CLICKHOUSE_DB key should go before the CLICKHOUSE_HOST key

(UnorderedKey)


[warning] 7-7: [UnorderedKey] The AI_ANALYST_CLICKHOUSE_PASSWORD key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 8-8: [UnorderedKey] The BETTER_AUTH_SECRET key should go before the CLICKHOUSE_DB key

(UnorderedKey)


[warning] 11-11: [UnorderedKey] The INITIAL_USER_NAME key should go before the INITIAL_USER_PASSWORD key

(UnorderedKey)


[warning] 16-16: [UnorderedKey] The AUTH_BASE_URL key should go before the LOG_LEVEL key

(UnorderedKey)

🔇 Additional comments (8)
apps/anomaly-worker/.gitignore (1)

1-1: Good secret hygiene for local env files.

Ignoring .env here is the right default for preventing accidental secret commits.

apps/monitor-app/src/test/global-setup.ts (3)

1-6: LGTM!

Imports are clean and module-level state is appropriately scoped.


8-42: LGTM!

Container lifecycle is correctly structured. The try block properly wraps all operations that could fail after container start, enabling cleanup in the catch block.


43-53: Preserve the original setup error if cleanup fails.

If container.stop() throws (e.g., container already dead, network error), that exception replaces the original error, making root-cause triage harder.

🔧 Suggested hardening
   } catch (error) {
-    await container.stop();
-    container = undefined;
-    
     const configPath = path.resolve(process.cwd(), ".vitest-ch-config.json");
-    if (fs.existsSync(configPath)) {
-      fs.unlinkSync(configPath);
-    }
-
-    throw error;
+    try {
+      await container.stop();
+    } catch {
+      // best-effort cleanup; preserve original setup error
+    }
+    container = undefined;
+    
+    if (fs.existsSync(configPath)) {
+      fs.unlinkSync(configPath);
+    }
+
+    throw error;
   }
apps/monitor-app/src/test/clickhouse-test-utils.ts (1)

143-168: Good fail-fast behavior in optimizeAnomalies.

The explicit timeout/error before OPTIMIZE prevents silent green tests when cwv_stats_hourly never receives rows.

apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts (1)

14-44: Per-channel delivery reporting is implemented well.

send() now returns a concrete result object and preserves per-channel outcomes cleanly.

apps/monitor-app/clickhouse/catalog.yml (1)

65-77: processed_anomalies catalog coverage looks aligned now.

The documented columns now reflect the anomaly processing model and should reduce AI/query drift.

apps/monitor-app/src/app/server/domain/notifications/service.ts (1)

31-45: Batch-limited fan-out is a good resilience improvement.

The fixed concurrency window plus inter-batch delay should reduce pressure on AI and webhook dependencies.

Comment on lines +617 to +620
const intervalMs = minutesPastHour > 30
? 60_000
: Math.floor((minutesPastHour * 60_000) / 35);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against zero intervalMs near top of hour.

When minutesPastHour is 0, Line 619 yields 0, collapsing intended timestamp spread for current-hour anomaly points.

🛠️ Proposed fix
   const intervalMs = minutesPastHour > 30 
     ? 60_000
-    : Math.floor((minutesPastHour * 60_000) / 35);
+    : Math.max(1_000, Math.floor((minutesPastHour * 60_000) / 35));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/scripts/seed-demo-data.mjs` around lines 617 - 620, The
computed intervalMs can become 0 when minutesPastHour is 0; update the
intervalMs calculation (the variable intervalMs that uses minutesPastHour) to
ensure a non‑zero minimum by wrapping the Math.floor(...) result with
Math.max(1, ... ) (or similar min value) so timestamps still spread; keep the
existing conditional for minutesPastHour > 30 returning 60_000 and only apply
the non‑zero guard to the Math.floor branch.

Comment on lines +96 to +107
await dispatcher.send(notification);
logger.info({ anomaly_id }, "Notification sent.");

await insertProcessedAnomaly({
anomaly_id,
project_id,
metric_name,
route,
device_type,
last_z_score: z_score,
status: 'notified',
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not mark anomaly as notified when delivery failed on all channels.

dispatcher.send() returns per-channel results, but the current flow unconditionally writes status: 'notified', which can permanently drop failed notifications from retries.

🛠️ Proposed fix
-      await dispatcher.send(notification);
-      logger.info({ anomaly_id }, "Notification sent.");
+      const delivery = await dispatcher.send(notification);
+      const delivered = Object.values(delivery).some(channel => channel?.success);
+      if (!delivered) {
+        logger.warn(
+          { anomaly_id, delivery },
+          "Notification failed on all channels. Leaving anomaly unprocessed for retry.",
+        );
+        return;
+      }
+      logger.info({ anomaly_id, delivery }, "Notification sent.");
 
       await insertProcessedAnomaly({
         anomaly_id,
         project_id,
         metric_name,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await dispatcher.send(notification);
logger.info({ anomaly_id }, "Notification sent.");
await insertProcessedAnomaly({
anomaly_id,
project_id,
metric_name,
route,
device_type,
last_z_score: z_score,
status: 'notified',
});
const delivery = await dispatcher.send(notification);
const delivered = Object.values(delivery).some(channel => channel?.success);
if (!delivered) {
logger.warn(
{ anomaly_id, delivery },
"Notification failed on all channels. Leaving anomaly unprocessed for retry.",
);
return;
}
logger.info({ anomaly_id, delivery }, "Notification sent.");
await insertProcessedAnomaly({
anomaly_id,
project_id,
metric_name,
route,
device_type,
last_z_score: z_score,
status: 'notified',
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/domain/notifications/service.ts` around lines
96 - 107, The code unconditionally marks anomalies as 'notified' after calling
dispatcher.send(notification), but dispatcher.send returns per-channel delivery
results that may indicate all channels failed; update the flow in the function
using dispatcher.send and insertProcessedAnomaly so you inspect the returned
per-channel results, determine if any channel succeeded, and set the inserted
record's status to 'notified' only when at least one delivery succeeded,
otherwise set status to 'failed' (or 'pending_retry' per project policy); also
include failure details from the dispatcher result in the
logger.info/logger.error and in the inserted metadata so failed deliveries can
be retried or diagnosed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/monitor-app/src/app/server/domain/notifications/service.ts`:
- Around line 20-40: Fetch step currently returns candidates via
getUnprocessedAnomalies but marking happens later in processAndNotify which
allows TOCTOU duplicates across workers; modify the flow to perform an atomic
"claim" before sending: implement a claim function (e.g., claimAnomalies or
extend getUnprocessedAnomalies to atomically insert/update a claim record keyed
by anomaly_id + project_id using a DB-level upsert/transaction and return only
successfully claimed rows) and then change the loop to map/process only those
claimed anomalies with processAndNotify; ensure the claim is idempotent (insert
on conflict do nothing or update where claimed_at is null) and that failures to
claim are skipped, and workflows still handle Promise.allSettled results as
before.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fd942a and 660433e.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (6)
  • apps/monitor-app/package.json
  • apps/monitor-app/src/app/server/domain/ai-bridge/__tests__/ai-bridge-service.integration.test.ts
  • apps/monitor-app/src/app/server/domain/ai-bridge/service.ts
  • apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts
  • apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts
  • apps/monitor-app/src/app/server/domain/notifications/service.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • apps/monitor-app/src/app/server/domain/ai-bridge/tests/ai-bridge-service.integration.test.ts
  • apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts
  • apps/monitor-app/src/app/server/domain/notifications/tests/notifications-service.integration.test.ts
  • apps/monitor-app/package.json
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: check_pull_request
🧰 Additional context used
🧬 Code graph analysis (2)
apps/monitor-app/src/app/server/domain/notifications/service.ts (8)
apps/monitor-app/src/env.ts (1)
  • env (41-93)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts (1)
  • getUnprocessedAnomalies (31-61)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (1)
  • AnomalyRow (85-97)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/projects-repository.ts (1)
  • getProjectById (32-44)
apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (1)
  • aiBridge (94-94)
apps/monitor-app/src/app/server/domain/notifications/types.ts (1)
  • NotificationPayload (8-19)
apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts (1)
  • dispatcher (138-141)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts (1)
  • insertProcessedAnomaly (5-33)
apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (6)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (2)
  • AnomalyRow (85-97)
  • ProjectRow (4-11)
apps/monitor-app/src/env.ts (1)
  • env (41-93)
apps/monitor-app/src/app/server/lib/clickhouse/client.ts (2)
  • aiSql (120-120)
  • getDirectAiClient (123-123)
apps/monitor-app/src/app/server/lib/logger.ts (1)
  • logger (6-9)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts (1)
  • getAnomalyById (63-86)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/projects-repository.ts (1)
  • getProjectById (32-44)
🔇 Additional comments (4)
apps/monitor-app/src/app/server/domain/ai-bridge/service.ts (3)

23-46: executeSql path split and error handling look solid.

Good separation between tagged-template execution and direct client execution, with consistent failure logging that avoids leaking raw SQL payloads.


49-52: Catalog lookup is now robust against varying working directories.

Using a module-relative path here is a strong improvement over cwd-dependent resolution.


65-83: Trend query scoping is correctly tightened.

project_id is typed as UUID and device_type is now included, which prevents cross-device trend mixing for anomaly context.

apps/monitor-app/src/app/server/domain/notifications/service.ts (1)

96-120: Delivery-result gating is implemented correctly.

Great fix: anomalies are marked as processed only when at least one channel succeeds, and all-channel failure now stays retryable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/monitor-app/src/app/server/domain/notifications/service.ts`:
- Around line 55-64: The inserted claim from tryClaimAnomaly can become orphaned
when all notification channels fail, so add logic to remove that claim on total
delivery failure: implement a repository function like
deleteProcessedAnomaly(anomaly_id, project_id) and call it in the branch after
channel delivery fails (the failure handling around where tryClaimAnomaly is
used and subsequent channel-sending code), ensuring the record in
processed_anomalies is deleted so getUnprocessedAnomalies (which uses NOT EXISTS
on processed_anomalies) can pick the anomaly up again; alternatively, if you
prefer stateful retries, implement updateProcessedAnomaly to set status =
'failed' and adjust getUnprocessedAnomalies to include stale 'failed' entries
older than a threshold.

In
`@apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts`:
- Around line 35-52: The tryClaimAnomaly function uses a non-atomic
SELECT-count-then-insert (SELECT ... FROM processed_anomalies FINAL then
insertProcessedAnomaly) which allows race conditions and duplicate active rows;
replace this with a single atomic INSERT ... SELECT ... WHERE NOT EXISTS pattern
executed via aiSql against the processed_anomalies table so the INSERT only
occurs when no row with the same anomaly_id and project_id exists (preserve
inserting status: 'new' and any required fields from
InsertableProcessedAnomalyRow), or explicitly document and accept the transient
duplicates given ReplacingMergeTree(updated_at) dedup behavior; update the
tryClaimAnomaly implementation to call the new single-statement insert logic
instead of insertProcessedAnomaly to eliminate the race.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 660433e and 57ec36b.

📒 Files selected for processing (4)
  • apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql
  • apps/monitor-app/src/app/server/domain/notifications/__tests__/notifications-service.integration.test.ts
  • apps/monitor-app/src/app/server/domain/notifications/service.ts
  • apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/monitor-app/src/app/server/domain/notifications/tests/notifications-service.integration.test.ts
  • apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: check_pull_request
🧰 Additional context used
🧬 Code graph analysis (2)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts (3)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (1)
  • InsertableProcessedAnomalyRow (99-109)
apps/monitor-app/src/app/server/lib/clickhouse/client.ts (1)
  • aiSql (120-120)
apps/monitor-app/src/app/server/lib/logger.ts (1)
  • logger (6-9)
apps/monitor-app/src/app/server/domain/notifications/service.ts (6)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/anomalies-repository.ts (1)
  • getUnprocessedAnomalies (31-61)
apps/monitor-app/src/app/server/lib/clickhouse/schema.ts (1)
  • AnomalyRow (85-97)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts (2)
  • tryClaimAnomaly (35-52)
  • insertProcessedAnomaly (5-33)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/projects-repository.ts (1)
  • getProjectById (32-44)
apps/monitor-app/src/app/server/domain/notifications/types.ts (1)
  • NotificationPayload (8-19)
apps/monitor-app/src/app/server/domain/notifications/dispatcher.ts (1)
  • dispatcher (138-141)
🔇 Additional comments (6)
apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts (1)

5-33: LGTM!

The insertProcessedAnomaly function uses parameterized SQL correctly, has proper error handling with logging, and re-throws errors for upstream handling.

apps/monitor-app/src/app/server/domain/notifications/service.ts (5)

1-13: LGTM!

Clean imports and well-defined constants for concurrency control. The sleep helper is appropriately simple.


15-49: Batch processing correctly bounds concurrency.

The implementation properly addresses the previous concern about unbounded fan-out. Using Promise.allSettled ensures individual failures don't abort the batch, and the delay between batches protects downstream dependencies.


75-105: LGTM!

The notification payload is well-structured with relevant fields, proper numeric formatting, and useful action buttons. The metadata correctly captures identifiers for downstream correlation.


107-125: Delivery success check addresses previous concern.

The implementation correctly inspects delivery results and only marks as 'notified' when at least one channel succeeds, addressing the previous critical issue about unconditionally writing status.

Note: This creates two inserts per successful notification (first 'new' from claim, then 'notified' here). This relies on ClickHouse's merge behavior to resolve the final state, which should work correctly with FINAL queries.


137-141: LGTM!

Clean exports providing both the class instance and a convenience wrapper function.

Comment on lines +55 to +64
const success = await tryClaimAnomaly({
anomaly_id,
project_id,
metric_name: anomaly.metric_name,
route: anomaly.route,
device_type: anomaly.device_type,
last_z_score: z_score
});

if (!success) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Failed notifications become orphaned and won't be retried.

When tryClaimAnomaly succeeds (inserting with status 'new'), but all notification channels subsequently fail (lines 126-131), the anomaly remains in processed_anomalies with status 'new'. Since getUnprocessedAnomalies filters by NOT EXISTS on processed_anomalies, this anomaly will never be picked up again for retry.

Consider either:

  1. Deleting or updating the claim on total delivery failure to allow retry
  2. Adding a separate retry mechanism that queries for status = 'new' records older than a threshold
♻️ Option 1: Remove failed claim to allow retry
       } else {
         logger.error(
           { anomaly_id, deliveryResult }, 
           "All notification channels failed."
         );
+        // Remove claim to allow retry on next run
+        await deleteProcessedAnomaly(anomaly_id, project_id);
       }

This requires adding a deleteProcessedAnomaly function to the repository.

Also applies to: 126-131

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/monitor-app/src/app/server/domain/notifications/service.ts` around lines
55 - 64, The inserted claim from tryClaimAnomaly can become orphaned when all
notification channels fail, so add logic to remove that claim on total delivery
failure: implement a repository function like deleteProcessedAnomaly(anomaly_id,
project_id) and call it in the branch after channel delivery fails (the failure
handling around where tryClaimAnomaly is used and subsequent channel-sending
code), ensuring the record in processed_anomalies is deleted so
getUnprocessedAnomalies (which uses NOT EXISTS on processed_anomalies) can pick
the anomaly up again; alternatively, if you prefer stateful retries, implement
updateProcessedAnomaly to set status = 'failed' and adjust
getUnprocessedAnomalies to include stale 'failed' entries older than a
threshold.

Comment on lines +35 to +52
export async function tryClaimAnomaly(anomaly: InsertableProcessedAnomalyRow): Promise<boolean> {
try {
const existing = await aiSql<{ count: number }>`
SELECT count() as count
FROM processed_anomalies FINAL
WHERE anomaly_id = ${anomaly.anomaly_id}
AND project_id = ${anomaly.project_id}
`.then(res => res[0]?.count ?? 0);

if (existing > 0) return false;

await insertProcessedAnomaly({ ...anomaly, status: 'new' });
return true;
} catch (error) {
logger.error({ err: error, id: anomaly.anomaly_id }, "Claim error");
return false;
}
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find table definition for processed_anomalies
find . -type f \( -name "*.sql" -o -name "*.ts" -o -name "*.js" \) | xargs rg -l "processed_anomalies" | head -20

Repository: Blazity/next-cwv-monitor

Length of output: 495


🏁 Script executed:

# Look for CREATE TABLE statement for processed_anomalies
rg "CREATE TABLE.*processed_anomalies|create table.*processed_anomalies" -i -A 30

Repository: Blazity/next-cwv-monitor

Length of output: 3202


🏁 Script executed:

# Check the insertProcessedAnomaly implementation
rg -A 15 "function insertProcessedAnomaly|const insertProcessedAnomaly"

Repository: Blazity/next-cwv-monitor

Length of output: 2009


🏁 Script executed:

# Verify that there's no unique constraint on (anomaly_id, project_id)
cat apps/monitor-app/clickhouse/migrations/007_anomaly_detection.sql | rg -i "unique|constraint"

Repository: Blazity/next-cwv-monitor

Length of output: 50


🏁 Script executed:

# Check if INSERT...SELECT pattern is used elsewhere or if INSERT...WHERE NOT EXISTS is feasible
rg "INSERT.*SELECT|INSERT.*WHERE" apps/monitor-app/src/app/server/lib/clickhouse/repositories/ -A 3

Repository: Blazity/next-cwv-monitor

Length of output: 50


Race condition in tryClaimAnomaly allows duplicate inserts across concurrent workers.

The check-then-insert pattern (SELECT count, then INSERT if 0) is not atomic. Two workers can both observe count=0 and both proceed to insert the same (anomaly_id, project_id) pair. Although ReplacingMergeTree(updated_at) will eventually deduplicate based on the latest updated_at during background merges, duplicates exist temporarily in the active dataset, causing duplicate processing notifications.

The FINAL modifier in the SELECT forces pending merges but does not prevent concurrent inserts. Consider using INSERT ... SELECT ... WHERE NOT EXISTS in a single statement, or explicitly accept that brief duplicate processing is tolerable given the eventual deduplication behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/monitor-app/src/app/server/lib/clickhouse/repositories/processed-anomalies-repository.ts`
around lines 35 - 52, The tryClaimAnomaly function uses a non-atomic
SELECT-count-then-insert (SELECT ... FROM processed_anomalies FINAL then
insertProcessedAnomaly) which allows race conditions and duplicate active rows;
replace this with a single atomic INSERT ... SELECT ... WHERE NOT EXISTS pattern
executed via aiSql against the processed_anomalies table so the INSERT only
occurs when no row with the same anomaly_id and project_id exists (preserve
inserting status: 'new' and any required fields from
InsertableProcessedAnomalyRow), or explicitly document and accept the transient
duplicates given ReplacingMergeTree(updated_at) dedup behavior; update the
tryClaimAnomaly implementation to call the new single-statement insert logic
instead of insertProcessedAnomaly to eliminate the race.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant