Data Purge Job — Are session_data and other tables needed for complete cleanup ? #4275
Unanswered
Geethika-sg
asked this question in
Q&A
Replies: 1 comment
-
|
Your approach looks reasonable. For a complete cleanup you'd also want to hit
So if you're deleting old Something like: DELETE FROM session_data WHERE session_id IN (
SELECT s.session_id FROM session s
LEFT JOIN website_event we ON we.session_id = s.session_id
WHERE we.session_id IS NULL
AND s.created_at < NOW() - INTERVAL 'X days'
);
DELETE FROM session WHERE session_id NOT IN (
SELECT DISTINCT session_id FROM website_event
) AND created_at < NOW() - INTERVAL 'X days';Run the event deletion first, then the session cleanup. Order matters because of the foreign key references. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have implemented a direct database cleanup job (K8S CronJob) to purge analytics data older than a configurable number of days. We took the database-direct approach because the DELETE /api/websites/:id/data endpoint does not appear to exist in v2.19.0 — API calls to it returned a 404 HTML response. The only available API options we found were deleting an entire website or resetting all data with no date range support, neither of which suited our use case.
Our cleanup approach
We delete rows filtered by created_at date range in the following order (to respect foreign key constraints) :
event_data -> deleted first, filtered via subquery on website_event.event_id
website_event -> deleted by created_at range
session -> deleted by created_at range
The job runs as a transaction (BEGIN / COMMIT) and performs a post-deletion count verification before exiting.
Tables we can see in our database
Tables :
_prisma_migrations
event_data
link
pixel
report
revenue
segment
session
session_data
team
team_user
user
website
website_event
Question :
Given the table list above, is session_data also expected to grow over time and should it be included in date-range purges?
And are there any other tables from this list (like : revenue, link, pixel etc.) that might accumulate and should be purged alongside website_event and session for a complete cleanup?
Could anyone guide on the correct and complete set of tables to target for data retention purges ?
Beta Was this translation helpful? Give feedback.
All reactions