fix: clean up ClickHouse analytics rows on user/team/website/link/pixel deletion#4246
fix: clean up ClickHouse analytics rows on user/team/website/link/pixel deletion#4246anvme wants to merge 5 commits into
Conversation
|
@anvme is attempting to deploy a commit to the Umami Software Team on Vercel. A member of the Team first needs to authorize it. |
Greptile SummaryThis PR adds ClickHouse analytics cleanup to every deletion path (website, link, pixel, user, team) and closes a long-standing gap where hard-deleting any entity left orphan rows in all 7 CH tables forever. It also fixes soft-delete filters on list and collect routes and wires Redis slug-cache invalidation for link/pixel updates and deletes.
Confidence Score: 4/5Safe to merge with one targeted fix in deleteWebsite. The deleteWebsite non-cloud else branch calls CH cleanup but omits redis.client.del for the website key. Because load.ts warms that Redis key with a 24h TTL on every analytics load regardless of cloud mode, a non-cloud + Redis deployment will serve the deleted website from cache for up to a day after deletion. The analogous handlers in user.ts and team.ts correctly call invalidateRedis() before the CH cleanup — this is an isolated gap in website.ts only. src/queries/prisma/website.ts — the non-cloud .then() branch needs redis.client.del alongside the ClickHouse call. Important Files Changed
Sequence DiagramsequenceDiagram
participant API
participant PG as Postgres (Prisma tx)
participant Redis
participant CH as ClickHouse
API->>PG: collect entity IDs (links, pixels, websites)
API->>PG: transaction — hard-delete / soft-delete rows
PG-->>API: commit
API->>Redis: del link/pixel/website cache keys
alt cloudMode
note over Redis: website key already cleared above
else not cloudMode and CH enabled
API->>CH: ALTER TABLE website_event DELETE WHERE website_id IN (...)
API->>CH: ALTER TABLE event_data DELETE ...
API->>CH: ALTER TABLE session_data DELETE ...
API->>CH: ALTER TABLE session_replay DELETE ...
API->>CH: ALTER TABLE website_revenue DELETE ...
API->>CH: ALTER TABLE website_event_stats_hourly DELETE ...
API->>CH: ALTER TABLE event_data_pivot DELETE ...
CH-->>API: mutations queued (async in CH)
end
API-->>API: return result
Reviews (1): Last reviewed commit: "fix: clean up ClickHouse analytics rows ..." | Re-trigger Greptile |
|
Hey @mikecao I hope you merge this |
…st queries Stacks on top of umami-software#4243. Five adjacent bugs from the same family: - deleteTeam left team-owned websites (and all dependent rows) orphaned. Added inline cleanup mirroring deleteWebsite. - non-cloud deleteUser, when hard-deleting the user's owned teams, also left team-owned websites orphaned. Extended the existing ownedFilter pattern (cloud-gated OR) to cover websites. - getTeamLinks/getUserPixels/getTeamPixels did not filter deletedAt: null, leaking soft-deleted entries into list views. - cloud deleteUser restamped already-soft-deleted websites' deleted_at; added deletedAt: null guard (same shape as link/pixel restamping fix). - Surfaced pre-existing gaps in deleteUser non-cloud: missing sessionReplaySaved/sessionReplay/revenue/segment cleanups, entityIds excluded website ids so website-shares were orphaned.
Address Greptile review feedback on umami-software#4245. - deleteLink and deletePixel now redis.client.del('link:slug' / 'pixel:slug') using the slug returned by Prisma's delete(). Previously the row was hard- deleted but the Redis cache (24h TTL) kept serving the slug, so /q/<slug> and /p/<slug> kept firing for up to a day after deletion. - updateLink and updatePixel now invalidate the cache for the current slug, and additionally for the previous slug if the slug was changed. Previously changing a link's destination URL or slug left the public cache stale. - Cloud-mode link.updateMany and pixel.updateMany in deleteUser now spread ownedFilter (which is { userId } in cloud mode) instead of hardcoding { userId }, so the cleanup intent stays consistent if ownedFilter ever evolves. Verified empirically against a Docker Postgres + Redis: deleted link's /q/<slug> returns 404 immediately (was: still redirected to old URL for 24h); slug rename invalidates both old and new cache keys.
…el deletion No delete path touched ClickHouse before this, leaving analytics events orphaned in CH after entities were removed in PostgreSQL. With CH enabled (typical high-volume setup), every delete left potentially millions of rows behind. - src/lib/clickhouse.ts: new deleteByWebsiteIds helper that issues ALTER TABLE DELETE WHERE website_id IN (...) against all 7 CH tables. Materialized View target tables (website_revenue, website_event_stats_hourly, event_data_pivot) need explicit DELETE — ClickHouse MVs don't cascade. - All 5 delete paths call the helper after the PG transaction, non-cloud only. Cloud preserves CH for billing/audit, matching the existing soft-delete retention philosophy. - For user/team, cleanup IDs are websites + links + pixels. Boards have no events. /api/send writes link.id and pixel.id into the website_id column for /q/<slug> and /p/<slug> events. - Failure handling matches the existing Redis invalidation pattern.
Pre-existing bug surfaced by this PR's restructure: deleteWebsite only
called redis.client.del('website:${id}') in cloud mode, but
fetchWebsite (src/lib/load.ts) caches the same key whenever Redis is
enabled regardless of cloud mode. So non-cloud + Redis deployments served
deleted websites from cache for up to 24h.
Split the .then() into two independent guards:
- Redis del runs whenever redis.enabled (matches user.ts/team.ts pattern)
- CH cleanup runs only when !cloudMode (preserves cloud retention policy)
Verified empirically: cache key EXISTS=1 before delete, EXISTS=0 after.
5eb645b to
72e98e1
Compare
Resolve conflict in src/queries/prisma/user.ts: identical to the cosmetic ownedFilter overlap — upstream collapsed the ternary to one line while this branch added an explanatory comment. Kept the comment and the single-line form (fits Biome lineWidth: 100). Verified the auto-merge of website.ts kept all four changes: this branch's clickhouse import + deleteWebsite cleanup (redis.enabled / !cloudMode), and upstream's new getTeamWebsiteCount + reformatted getAllUserWebsitesIncludingTeamAccess signature. Upstream made no changes to clickhouse.ts, link.ts, pixel.ts, or team.ts.
Third in the deletion-cleanup series after #4243 (link/pixel/board orphans on user/team delete) and #4245 (cascade websites + soft-delete leaks). Stacks on top of #4245 — until those merge, the GitHub diff includes their commits too. For incremental review now, focus on commit
871aa3b.Problem
No delete path in the codebase touched ClickHouse before this. With
CLICKHOUSE_URLset (the typical high-volume setup), every delete left analytics events orphaned forever:website_event,event_data,session_data,session_replay,website_revenue,website_event_stats_hourly,event_data_pivot./api/sendwriteslink.idandpixel.idinto thewebsite_idcolumn for/q/<slug>and/p/<slug>events (src/app/api/send/route.ts:98), so link/pixel events orphan under their entity id, not a website id.Solution
src/lib/clickhouse.tsdeleteByWebsiteIds(ids)helper. IssuesALTER TABLE ... DELETE WHERE website_id IN (...)against all 7 CH tables in parallel viaclient.command(). ClickHouse Materialized Views are insert-time triggers and do not cascade mutations, so the 3 MV target tables (website_revenue,website_event_stats_hourly,event_data_pivot) need explicit DELETEs.deleteWebsite/deleteUser/deleteTeam/deleteLink/deletePixelFor
deleteUser/deleteTeam, the cleanup ID set iswebsites + links + pixels(boards generate no analytics events).Why
ALTER TABLE DELETEinstead of lightweightDELETE FROM?Lightweight
DELETE FROMis generally recommended by CH docs, BUT it throws by default on tables with projections (website_eventhas 2 projections,session_datahas 1).ALTER TABLE DELETEworks uniformly across all 7 tables without per-query setting tweaks. Trade-off: heavier mutation, async by default — acceptable for the rare delete-on-erasure use case.Failure handling
awaitand propagate (matches the existing Redis invalidation pattern). If CH is briefly unreachable, the function rejects after PG has already committed — same dual-write race as Redis. Outbox/retry pattern is flagged as a separate PR for closing that race across both Redis and CH consistently.Test plan
pnpm build-appclean/api/send, deleted website. CH state:website_event3 -> 0,website_event_stats_hourly3 -> 0, all 7 mutationsis_done=1insystem.mutations. Both projection tables (website_event,session_data) processed cleanly./q/<slug>to fire events, deleted link.website_eventfor link.id: 2 -> 0.Out of scope (separate PRs / discussion)
resetWebsite: also wipes PG analytics without touching CH today; same fix pattern would apply.