Skip to content

fix: clean up ClickHouse analytics rows on user/team/website/link/pixel deletion#4246

Open
anvme wants to merge 6 commits into
umami-software:devfrom
anvme:fix/clickhouse-cleanup
Open

fix: clean up ClickHouse analytics rows on user/team/website/link/pixel deletion#4246
anvme wants to merge 6 commits into
umami-software:devfrom
anvme:fix/clickhouse-cleanup

Conversation

@anvme
Copy link
Copy Markdown

@anvme anvme commented May 7, 2026

Third in the deletion-cleanup series after #4243 (link/pixel/board orphans on user/team delete) and #4245 (cascade websites + soft-delete leaks). Stacks on top of #4245 — until those merge, the GitHub diff includes their commits too. For incremental review now, focus on commit 871aa3b.

Problem

No delete path in the codebase touched ClickHouse before this. With CLICKHOUSE_URL set (the typical high-volume setup), every delete left analytics events orphaned forever:

  • Deleting a website left its events in website_event, event_data, session_data, session_replay, website_revenue, website_event_stats_hourly, event_data_pivot.
  • Deleting a link/pixel was even subtler: /api/send writes link.id and pixel.id into the website_id column for /q/<slug> and /p/<slug> events (src/app/api/send/route.ts:98), so link/pixel events orphan under their entity id, not a website id.

Solution

File Change
src/lib/clickhouse.ts New deleteByWebsiteIds(ids) helper. Issues ALTER TABLE ... DELETE WHERE website_id IN (...) against all 7 CH tables in parallel via client.command(). ClickHouse Materialized Views are insert-time triggers and do not cascade mutations, so the 3 MV target tables (website_revenue, website_event_stats_hourly, event_data_pivot) need explicit DELETEs.
deleteWebsite / deleteUser / deleteTeam / deleteLink / deletePixel Call the helper after the Prisma transaction commits, non-cloud only. Cloud mode preserves CH analytics for billing/audit, matching the existing soft-delete retention philosophy.

For deleteUser / deleteTeam, the cleanup ID set is websites + links + pixels (boards generate no analytics events).

Why ALTER TABLE DELETE instead of lightweight DELETE FROM?

Lightweight DELETE FROM is generally recommended by CH docs, BUT it throws by default on tables with projections (website_event has 2 projections, session_data has 1). ALTER TABLE DELETE works uniformly across all 7 tables without per-query setting tweaks. Trade-off: heavier mutation, async by default — acceptable for the rare delete-on-erasure use case.

Failure handling

await and propagate (matches the existing Redis invalidation pattern). If CH is briefly unreachable, the function rejects after PG has already committed — same dual-write race as Redis. Outbox/retry pattern is flagged as a separate PR for closing that race across both Redis and CH consistently.

Test plan

  • pnpm build-app clean
  • E2E Test 1 (deleteWebsite): created website, fired 3 events via /api/send, deleted website. CH state: website_event 3 -> 0, website_event_stats_hourly 3 -> 0, all 7 mutations is_done=1 in system.mutations. Both projection tables (website_event, session_data) processed cleanly.
  • E2E Test 2 (deleteLink with link.id used as website_id): created link, hit /q/<slug> to fire events, deleted link. website_event for link.id: 2 -> 0.
  • E2E Test 3 (deleteUser non-cloud): created user owning a website + a link, fired events for both, deleted user. Both website-events and link-events: cleared.

Out of scope (separate PRs / discussion)

  • Outbox/retry: durable cleanup record so partial failures (PG committed, CH unreachable) eventually retry and converge. Would also benefit Redis invalidation.
  • Backfill: one-shot script to clean EXISTING orphan CH rows from past deletes (pre-this-PR).
  • Cloud-mode CH erasure: env-flag option for tenants who require strict GDPR erasure even in cloud.
  • resetWebsite: also wipes PG analytics without touching CH today; same fix pattern would apply.

anvme added 5 commits May 7, 2026 03:37
deleteUser and deleteTeam left link/pixel/board rows (and their share rows)
in the database after the owner was removed. /q/<slug> and /p/<slug>
also kept serving deleted entries because the routes did not filter
deletedAt and Redis cached lookups for 24h.

- deleteUser: clean up link/pixel/board + shares for the deleted user.
  Cloud mode: soft-delete link/pixel, hard-delete board, only userId-owned.
  Non-cloud: hard-delete everything matching userId or owned teamIds.
- deleteTeam: same cleanup, scoped to teamId.
- /q and /p route handlers: filter deletedAt: null at the call sites
  (not in findLink/findPixel helpers, which would null-deref the
  permission checks at src/permissions/link.ts and pixel.ts).
- Post-transaction Redis invalidation mirrors deleteWebsite.
…eted slugs

Address Greptile review feedback on umami-software#4243.

- Cloud-mode link.updateMany / pixel.updateMany now filter where: { ..., deletedAt: null } so a previously soft-deleted row keeps its original deletion timestamp instead of being restamped with the current time.
- Pre-transaction findMany now selects deletedAt; the Redis invalidation list filters to only live slugs, avoiding harmless but wasted DEL calls for already-soft-deleted entries.

Note: the share.deleteMany cleanup still uses the broad entityId list (not filtered by deletedAt) so that orphan share rows of already-soft-deleted links/pixels are still cleaned up. Filtering the prefetch itself, as Greptile's exact suggestion proposed, would skip those shares while link.deleteMany still hard-deletes the rows, leaving orphan share rows behind. Verified empirically with a 3-scenario reproduction.
…st queries

Stacks on top of umami-software#4243. Five adjacent bugs from the same family:

- deleteTeam left team-owned websites (and all dependent rows) orphaned. Added
  inline cleanup mirroring deleteWebsite.
- non-cloud deleteUser, when hard-deleting the user's owned teams, also left
  team-owned websites orphaned. Extended the existing ownedFilter pattern
  (cloud-gated OR) to cover websites.
- getTeamLinks/getUserPixels/getTeamPixels did not filter deletedAt: null,
  leaking soft-deleted entries into list views.
- cloud deleteUser restamped already-soft-deleted websites' deleted_at;
  added deletedAt: null guard (same shape as link/pixel restamping fix).
- Surfaced pre-existing gaps in deleteUser non-cloud: missing
  sessionReplaySaved/sessionReplay/revenue/segment cleanups, entityIds
  excluded website ids so website-shares were orphaned.
Address Greptile review feedback on umami-software#4245.

- deleteLink and deletePixel now redis.client.del('link:slug' / 'pixel:slug')
  using the slug returned by Prisma's delete(). Previously the row was hard-
  deleted but the Redis cache (24h TTL) kept serving the slug, so /q/<slug>
  and /p/<slug> kept firing for up to a day after deletion.
- updateLink and updatePixel now invalidate the cache for the current slug,
  and additionally for the previous slug if the slug was changed. Previously
  changing a link's destination URL or slug left the public cache stale.
- Cloud-mode link.updateMany and pixel.updateMany in deleteUser now spread
  ownedFilter (which is { userId } in cloud mode) instead of hardcoding
  { userId }, so the cleanup intent stays consistent if ownedFilter ever
  evolves.

Verified empirically against a Docker Postgres + Redis: deleted link's
/q/<slug> returns 404 immediately (was: still redirected to old URL for 24h);
slug rename invalidates both old and new cache keys.
…el deletion

No delete path touched ClickHouse before this, leaving analytics events
orphaned in CH after entities were removed in PostgreSQL. With CH enabled
(typical high-volume setup), every delete left potentially millions of
rows behind.

- src/lib/clickhouse.ts: new deleteByWebsiteIds helper that issues
  ALTER TABLE DELETE WHERE website_id IN (...) against all 7 CH tables.
  Materialized View target tables (website_revenue, website_event_stats_hourly,
  event_data_pivot) need explicit DELETE — ClickHouse MVs don't cascade.
- All 5 delete paths call the helper after the PG transaction, non-cloud
  only. Cloud preserves CH for billing/audit, matching the existing
  soft-delete retention philosophy.
- For user/team, cleanup IDs are websites + links + pixels. Boards have
  no events. /api/send writes link.id and pixel.id into the website_id
  column for /q/<slug> and /p/<slug> events.
- Failure handling matches the existing Redis invalidation pattern.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 7, 2026

@anvme is attempting to deploy a commit to the Umami Software Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR adds ClickHouse analytics cleanup to every deletion path (website, link, pixel, user, team) and closes a long-standing gap where hard-deleting any entity left orphan rows in all 7 CH tables forever. It also fixes soft-delete filters on list and collect routes and wires Redis slug-cache invalidation for link/pixel updates and deletes.

  • src/lib/clickhouse.ts: New deleteByWebsiteIds(ids) helper issues ALTER TABLE … DELETE WHERE website_id IN {websiteIds:Array(UUID)} against all 7 CH tables in parallel; parameterised with UUID typing, guarded by the enabled flag and empty-array check.
  • user.ts / team.ts: Pre-transaction entity collection replaced by a parallel findMany batch; non-cloud paths now call invalidateRedis() then deleteByWebsiteIds(clickhouseIds) after commit; cloud paths gain soft-delete coverage for links/pixels/boards.
  • link.ts / pixel.ts: Redis slug-cache invalidation added for updates (old + new slug) and deletes; CH cleanup added for non-cloud deletes; deletedAt: null filter added to list and collect routes.

Confidence Score: 4/5

Safe to merge with one targeted fix in deleteWebsite.

The deleteWebsite non-cloud else branch calls CH cleanup but omits redis.client.del for the website key. Because load.ts warms that Redis key with a 24h TTL on every analytics load regardless of cloud mode, a non-cloud + Redis deployment will serve the deleted website from cache for up to a day after deletion. The analogous handlers in user.ts and team.ts correctly call invalidateRedis() before the CH cleanup — this is an isolated gap in website.ts only.

src/queries/prisma/website.ts — the non-cloud .then() branch needs redis.client.del alongside the ClickHouse call.

Important Files Changed

Filename Overview
src/lib/clickhouse.ts Adds deleteByWebsiteIds helper issuing ALTER TABLE DELETE against all 7 CH tables in parallel; parameterised query is safe, table list is hardcoded, enabled/empty-array guard is correct.
src/queries/prisma/website.ts Adds CH cleanup to the non-cloud deleteWebsite path, but the new else branch omits redis.client.del for the website key, leaving a stale Redis entry for up to 24h unlike the parallel user.ts/team.ts handlers.
src/queries/prisma/user.ts Rewrites deleteUser to collect all owned entities before the transaction and correctly calls both invalidateRedis() and deleteByWebsiteIds() after commit in non-cloud mode.
src/queries/prisma/team.ts Rewrites deleteTeam with the same entity-collection + invalidateRedis + CH-cleanup pattern; both cloud and non-cloud paths are correct.
src/queries/prisma/link.ts Adds Redis slug-cache invalidation on update, CH cleanup on delete (non-cloud), and deletedAt: null filter to list queries.
src/queries/prisma/pixel.ts Mirrors link.ts changes for pixel: Redis slug invalidation on update, CH cleanup on delete, deletedAt: null filter to list queries.
src/app/(collect)/p/[slug]/route.ts Adds deletedAt: null guard to pixel-lookup paths so soft-deleted pixels no longer respond to tracking requests.
src/app/(collect)/q/[slug]/route.ts Adds deletedAt: null guard to link-lookup paths so soft-deleted links no longer respond to tracking requests.

Sequence Diagram

sequenceDiagram
    participant API
    participant PG as Postgres (Prisma tx)
    participant Redis
    participant CH as ClickHouse

    API->>PG: collect entity IDs (links, pixels, websites)
    API->>PG: transaction — hard-delete / soft-delete rows
    PG-->>API: commit
    API->>Redis: del link/pixel/website cache keys
    alt cloudMode
        note over Redis: website key already cleared above
    else not cloudMode and CH enabled
        API->>CH: ALTER TABLE website_event DELETE WHERE website_id IN (...)
        API->>CH: ALTER TABLE event_data DELETE ...
        API->>CH: ALTER TABLE session_data DELETE ...
        API->>CH: ALTER TABLE session_replay DELETE ...
        API->>CH: ALTER TABLE website_revenue DELETE ...
        API->>CH: ALTER TABLE website_event_stats_hourly DELETE ...
        API->>CH: ALTER TABLE event_data_pivot DELETE ...
        CH-->>API: mutations queued (async in CH)
    end
    API-->>API: return result
Loading

Reviews (1): Last reviewed commit: "fix: clean up ClickHouse analytics rows ..." | Re-trigger Greptile

Comment thread src/queries/prisma/website.ts
Pre-existing bug surfaced by this PR's restructure: deleteWebsite only
called redis.client.del('website:${id}') in cloud mode, but
fetchWebsite (src/lib/load.ts) caches the same key whenever Redis is
enabled regardless of cloud mode. So non-cloud + Redis deployments served
deleted websites from cache for up to 24h.

Split the .then() into two independent guards:
- Redis del runs whenever redis.enabled (matches user.ts/team.ts pattern)
- CH cleanup runs only when !cloudMode (preserves cloud retention policy)

Verified empirically: cache key EXISTS=1 before delete, EXISTS=0 after.
@anvme
Copy link
Copy Markdown
Author

anvme commented May 15, 2026

Hey @mikecao I hope you merge this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant