Skip to content

Commit fbf9084

Browse files
authored
refactor(audit): direct-DB audit-cleanup CronJob (#3270)
1 parent 882d042 commit fbf9084

19 files changed

Lines changed: 587 additions & 577 deletions

File tree

.claude/rules/audit-logging.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -144,9 +144,17 @@ Notes:
144144

145145
### 5. New cron-triggered / system action
146146

147-
Use `logAction` directly with category `system` inside the route handler or
148-
helper that the cron calls. See `/api/audit/cleanup/route.ts` for the canonical
149-
pattern (success + failure self-audit with metadata).
147+
Two patterns are in use, depending on whether the cron runs **inside** the app
148+
(a tRPC procedure or Next.js route called by a CronJob container) or **out of
149+
band** (a standalone `.mjs` script invoked by the CronJob with direct DB
150+
access — see `packages/app/scripts/audit-cleanup.mjs` for the canonical
151+
example, issue #3268).
152+
153+
- **In-app trigger**: use `logAction` directly with category `system` inside
154+
the route handler / helper that the cron calls.
155+
- **Out-of-band script**: self-audit via a raw `INSERT INTO audit.action_log`
156+
statement at the end of the script (success) and in a try/catch arm
157+
(failure, outside the rolled-back transaction so the row survives).
150158

151159
---
152160

@@ -196,10 +204,10 @@ the caller is responsible for sanitisation:
196204
| `export` | 365 days | Data exports and third-party API consumers. |
197205
| `system` | 365 days | Cron-triggered / admin actions. |
198206

199-
The cleanup cron (`/api/audit/cleanup``cleanupAuditLogs`) drops
200-
`read_sensitive` rows after 180 days and everything else after 365 days. This
201-
is enforced at the DB level; the category you choose **defines** the retention
202-
window.
207+
The cleanup cron (`packages/app/scripts/audit-cleanup.mjs`, wired up in
208+
`.kontinuous/templates/audit-cleanup-cron.yaml`) drops `read_sensitive` rows
209+
after 180 days and everything else after 365 days. This is enforced at the DB
210+
level; the category you choose **defines** the retention window.
203211

204212
---
205213

.claude/rules/automation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ Apply these rules **as you write code**, before any agent runs:
114114
- New auth event / cron → direct `logAction` call; `logger.error` must stay synchronous (`void (async () => {...})()`)
115115
- Every new action requires **3 wire-up points**: `AUDIT_ACTIONS.*` constant, `AUDIT_ACTION_CATEGORIES` mapping, and the surface-specific wire (tRPC map / wrapper / direct call)
116116
- `metadata` jsonb must not contain secrets (auto-stripped keys: `password`, `token`, `refresh_token`, `secret`, `client_secret`, `authorization`, `apikey`, `api_key`, `accesskey`, `access_key`, `private_key`)
117-
- DB-layer changes in `~/server/audit/cleanup.ts` → add an integration test (`*.integration.test.ts`, runs via `pnpm test:integration`) — unit tests mock drizzle and miss driver bugs
117+
- DB-layer changes in `packages/app/scripts/audit-cleanup.mjs` (or any file that touches `audit.action_log` via non-trivial SQL) → add an integration test (`*.integration.test.ts`, runs via `pnpm test:integration`) — unit tests mock the DB driver and miss driver bugs
118118

119119
### E2E tests (when relevant)
120120

.github/workflows/e2e.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,6 @@ jobs:
3232
CLAMAV_HOST: localhost
3333
CLAMAV_PORT: 3310
3434
ADMIN_EMAILS: test@fia1.fr
35-
# Required by env.js but never exercised by E2E tests (the audit
36-
# cleanup route is only called by the K8s CronJob). A dummy value
37-
# that satisfies the `.min(32)` Zod check is enough.
38-
EGAPRO_AUDIT_CLEANUP_TOKEN: e2e-dummy-audit-cleanup-token-not-used-1234567890
3935
steps:
4036
- name: Checkout repository
4137
uses: actions/checkout@v4

.kontinuous/env/dev/templates/audit-cleanup.sealed-secret.yaml

Lines changed: 0 additions & 15 deletions
This file was deleted.

.kontinuous/env/preprod/templates/audit-cleanup.sealed-secret.yaml

Lines changed: 0 additions & 15 deletions
This file was deleted.

.kontinuous/env/prod/templates/audit-cleanup.sealed-secret.yaml

Lines changed: 0 additions & 16 deletions
This file was deleted.

.kontinuous/templates/audit-cleanup-cron.yaml

Lines changed: 53 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -10,51 +10,75 @@ spec:
1010
failedJobsHistoryLimit: 3
1111
jobTemplate:
1212
spec:
13-
# Fail fast: a failure here is either a transient 5xx (next daily run
14-
# will retry naturally) or a permanent auth error (retry is pointless
15-
# and noisy). Either way a single attempt is enough.
13+
# Fail fast: a failure here is either a transient DB error (next daily
14+
# run will retry naturally) or a permanent schema / driver error (retry
15+
# is pointless and noisy). Either way a single attempt is enough.
1616
backoffLimit: 0
1717
template:
1818
metadata:
1919
labels:
2020
app: audit-cleanup-daily
2121
spec:
22-
restartPolicy: OnFailure
22+
# `Never` is consistent with `backoffLimit: 0`: the kubelet does not
23+
# restart the failed container and the Job does not re-queue a new
24+
# pod. A failed cleanup waits for the next daily cron run.
25+
restartPolicy: Never
2326
securityContext:
24-
runAsUser: 100
27+
runAsUser: 1000
2528
runAsNonRoot: true
2629
containers:
27-
- name: trigger
28-
image: curlimages/curl:8.12.1
30+
- name: audit-cleanup
31+
# `app` matches `.Values.app.imagePackage` in `values.yaml` — but
32+
# that lookup lands in the `app` subchart's scope, not the
33+
# project-level templates, so the literal is inlined here.
34+
image: "{{ .Values.global.registry }}/{{ .Values.global.imageProject }}/{{ .Values.global.imageRepository }}/app:{{ .Values.global.imageTag }}"
2935
securityContext:
3036
allowPrivilegeEscalation: false
3137
env:
32-
- name: EGAPRO_AUDIT_CLEANUP_TOKEN
38+
- name: POSTGRES_HOST
3339
valueFrom:
3440
secretKeyRef:
35-
name: audit-cleanup
36-
key: EGAPRO_AUDIT_CLEANUP_TOKEN
41+
name: "{{ .Values.global.pgSecretName }}"
42+
key: PGHOST
43+
- name: POSTGRES_DB
44+
valueFrom:
45+
secretKeyRef:
46+
name: "{{ .Values.global.pgSecretName }}"
47+
key: PGDATABASE
48+
- name: POSTGRES_PORT
49+
valueFrom:
50+
secretKeyRef:
51+
name: "{{ .Values.global.pgSecretName }}"
52+
key: PGPORT
53+
- name: POSTGRES_USER
54+
valueFrom:
55+
secretKeyRef:
56+
name: "{{ .Values.global.pgSecretName }}"
57+
key: PGUSER
58+
- name: POSTGRES_PASSWORD
59+
valueFrom:
60+
secretKeyRef:
61+
name: "{{ .Values.global.pgSecretName }}"
62+
key: PGPASSWORD
63+
- name: POSTGRES_SSLMODE
64+
valueFrom:
65+
secretKeyRef:
66+
name: "{{ .Values.global.pgSecretName }}"
67+
key: PGSSLMODE
68+
# Retention windows (days) — optional. The script defaults to
69+
# the CNIL thresholds (180 / 365) when these are absent, so an
70+
# unset `audit-retention` configmap is a no-op.
71+
envFrom:
72+
- configMapRef:
73+
name: "audit-retention"
74+
optional: true
3775
command:
38-
- /bin/sh
39-
- -c
40-
- |
41-
# Capture the HTTP status explicitly so the failure reason
42-
# (auth vs. server vs. network) lands in stderr for
43-
# observability instead of the opaque "curl: exit 22".
44-
HTTP_STATUS=$(curl -s -o /dev/stderr \
45-
-w "%{http_code}" \
46-
-X POST "http://app:3000/api/audit/cleanup" \
47-
-H "Authorization: Bearer ${EGAPRO_AUDIT_CLEANUP_TOKEN}" \
48-
--connect-timeout 10 \
49-
--max-time 300)
50-
if [ "$HTTP_STATUS" -lt 200 ] || [ "$HTTP_STATUS" -ge 300 ]; then
51-
echo "Cleanup request failed with HTTP $HTTP_STATUS" >&2
52-
exit 1
53-
fi
76+
- node
77+
- packages/app/scripts/audit-cleanup.mjs
5478
resources:
5579
requests:
5680
cpu: 50m
57-
memory: 32Mi
58-
limits:
59-
cpu: 100m
6081
memory: 64Mi
82+
limits:
83+
cpu: 200m
84+
memory: 256Mi

.kontinuous/values.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,6 @@ app:
4444
- secretRef:
4545
name: "admin"
4646
optional: true
47-
- secretRef:
48-
name: "audit-cleanup"
49-
optional: true
5047
- configMapRef:
5148
name: "valkey"
5249
optional: true

packages/app/.env.example

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ S3_BUCKET_NAME="egapro-dev-app"
2323
CLAMAV_HOST="localhost"
2424
CLAMAV_PORT="3310"
2525
ADMIN_EMAILS="test@fia1.fr"
26-
EGAPRO_AUDIT_CLEANUP_TOKEN="change-me-to-a-32-chars-minimum-token"
2726
MAIL_ENABLED="true"
2827
SMTP_HOST="localhost"
2928
SMTP_PORT="1025"

packages/app/CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -315,7 +315,7 @@ Every new action needs **3 wire-up points**:
315315

316316
The `metadata` jsonb must **never** contain secrets. The recursive sanitizer auto-strips `password`, `token`, `refresh_token`, `secret`, `client_secret`, `authorization`, `apikey`, `api_key`, `accesskey`, `access_key`, `private_key` at any depth — but when calling `logAction` directly (outside tRPC), the caller is responsible for staying clean. Never put IP addresses in `metadata` — there is a dedicated `ip_address` column.
317317

318-
DB-layer changes in `~/server/audit/cleanup.ts` (or any file that touches `audit.action_log` via non-trivial SQL) **must** come with an integration test `*.integration.test.ts` — unit tests mock drizzle and will miss driver-level bugs. Run locally with `pnpm test:integration` (requires Docker).
318+
DB-layer changes in `packages/app/scripts/audit-cleanup.mjs` (or any file that touches `audit.action_log` via non-trivial SQL) **must** come with an integration test `*.integration.test.ts` — unit tests mock the DB driver and will miss driver-level bugs. Run locally with `pnpm test:integration` (requires Docker).
319319

320320
> Full playbook with code snippets and a PR checklist → [`.claude/rules/audit-logging.md`](../../.claude/rules/audit-logging.md)
321321

0 commit comments

Comments
 (0)