feat(audit): instrumentation des actions utilisateurs (#3191)#3193
feat(audit): instrumentation des actions utilisateurs (#3191)#3193
Conversation
Set up the foundation for user-action audit logging without instrumenting any existing code path. The audit table is created on deploy but stays empty until the instrumentation PR (#3191) lands. - New PostgreSQL schema `audit` with `audit.action_log` table (Drizzle schema + idempotent migration); decoupled from `app.users` so GDPR user deletion does not block audit retention. - New isomorphic module `~/modules/audit` (action keys, retention constants, types) — single source of truth for category mapping. - Server-side helpers in `~/server/audit`: * `logAction` — fail-safe writer (swallows DB errors so business logic is never blocked) * `requestContext` — IP / user-agent extraction * `trpcMiddleware` — generic middleware with recursive metadata sanitization (strips token/password/secret/apikey at any depth) * `withAuditedRoute` — Next.js Route Handler wrapper * `cleanup` — RGPD purge with two CNIL-compliant retention buckets (180d for read_sensitive, 365d for security logs) - New endpoint `POST /api/audit/cleanup` (fail-closed when token unset) - New K8s CronJob (`audit-cleanup-daily`, runs at 4am UTC) - Env vars: EGAPRO_AUDIT_CLEANUP_TOKEN, EGAPRO_AUDIT_RETENTION_SHORT_DAYS (180), EGAPRO_AUDIT_RETENTION_LONG_DAYS (365) Closes #3190
revu-bot
left a comment
There was a problem hiding this comment.
Audit Instrumentation PR Review
This PR wires the audit logging infrastructure (from #3192) to all user action points. The overall approach is sound — withAuditedRoute wrapper, tRPC middleware, and explicit auth event hooks are well-structured. A few issues need attention before merging.
| File | Lines | Severity | Issue |
|---|---|---|---|
auth/logout/route.ts |
30–43 | CRITICAL | Logout audit fires even when token is null (unauthenticated request) |
auth/logout/route.ts |
30–43 | IMPORTANT | void logAction(...) is fire-and-forget before session termination — logout can complete before audit write |
declaration-pdf/route.ts |
11–27 | IMPORTANT | auth() called twice per request (once in resolveContext, once in handler) |
auth/config.ts |
270–293 | IMPORTANT | logger.error is async but NextAuth's logger interface may not await it |
no-sanction-pdf/route.ts |
14–22 | MINOR | resolveContext ignores its _request parameter — inconsistent with other routes |
Critical Issues
- Unauthenticated logout audit: The logout route logs
auth.logouteven whentokenisnull(no active session). This creates spurious audit entries for unauthenticated GET requests to/api/auth/logout.
Important Issues
- Double
auth()calls:declaration-pdf,transmitted-pdf, anduploadroutes callauth()in bothresolveContextand the handler body. This doubles the session resolution overhead per request. - Async logger concern: NextAuth's
logger.errorsignature is synchronous in v4 — making itasyncmay cause the promise to be silently dropped, meaningsafeRequestContext()(which awaitsnextHeaders()) might not resolve correctly.
Overall Quality
The security-conscious design (whitelist-based error sanitization, parseSiren validation before DB writes, capturing identity before session invalidation) is solid. Test coverage for the no-sanction-pdf route is updated correctly. The main gaps are the null-token guard on logout and the redundant auth() calls.
Make the cleanup bearer token mandatory instead of optional, so a missing secret crashes the app at boot rather than silently exposing the destructive `/api/audit/cleanup` endpoint. Validation requires at least 32 characters (same bar as EGAPRO_SUIT_API_KEY). The runtime `if (!token)` fail-closed check in the route handler becomes dead code once env.js enforces presence — drop it to keep the handler focused on the bearer comparison. Every environment (dev / preprod / prod) must now have the secret sealed via kubeseal before deploy.
Seal the audit cleanup bearer token per environment (dev / preprod / prod) so the app can start after the fix that made the var mandatory. Without these, the Zod validation in env.js would crash the pod at boot on every environment.
Five fixes from the revu-bot review on #3192: - **cleanup.ts** — wrap the two DELETEs in a single transaction so a failure in the second statement rolls back the first, avoiding a silent partial purge (CRITICAL). - **cleanup.ts** — explicitly coerce `count` via `Number()` because postgres-js v3 returns it as a string; the previous code produced string concatenation when summing short + long counts (IMPORTANT). - **trpcMiddleware.ts** — extend `SENSITIVE_KEYS` with `refresh_token`, `client_secret`, `accesskey`, `access_key`, `private_key` so the recursive sanitizer catches every common credential shape before it hits the audit jsonb column (IMPORTANT). - **audit-cleanup-cron.yaml** — replace `curl --fail` (which loses the HTTP status) with explicit `%{http_code}` capture + clear stderr logging, and drop `backoffLimit` from 3 to 0 so a single failure no longer retries and generates alert noise (the daily schedule is enough retry granularity for this job) (IMPORTANT). - **requestContext.ts** — trim `x-real-ip` and `user-agent` for consistency with the `x-forwarded-for` branch (MINOR).
Four fixes from the revu-bot review on #3193: - **auth/logout/route.ts** — gate the `auth.logout` audit behind a token presence check so unauthenticated GETs no longer produce spurious rows. Also switch the `void logAction(...)` to `await` so the write is guaranteed to be persisted before the redirect is issued (logAction is fail-safe internally, a DB outage still won't block logout) (CRITICAL + IMPORTANT). - **auth/config.ts** — convert the `logger.error` callback from `async` to synchronous + detached IIFE. NextAuth v4's logger interface is sync; returning a promise risks the side-effect being silently dropped in serverless or cold-start contexts (IMPORTANT). - **audit/cachedAuth.ts (new)** — per-request memoisation of `auth()` via `WeakMap<Request, Promise<Session>>`, so routes that need the session in both `resolveContext` and the handler body only parse the JWT once. Applied to declaration-pdf, transmitted-pdf, no-sanction-pdf, and upload routes (IMPORTANT). Adds: - Two new logout-route tests (no audit on null token, audit with identity on valid session). - Full cachedAuth unit coverage (single-call dedup, multi-request isolation, concurrent in-flight promise sharing).
The two DELETEs used a raw \`sql\` template tag to interpolate the
Date thresholds:
sql\`... AND \${actionLogs.createdAt} < \${shortThreshold}\`
postgres-js does not convert a Date value to a bind parameter string
inside a raw template tag (it only does the coercion for columns
referenced through the drizzle schema), so the cleanup endpoint
crashed at runtime with:
TypeError: The "string" argument must be of type string or an
instance of Buffer or ArrayBuffer. Received an instance of Date
The bug was masked by the unit tests because they mocked
\`db.transaction\` / \`db.delete\` — the raw SQL was never actually
executed. It was caught by a manual Playwright+curl end-to-end run
against a review namespace.
Switch to drizzle's typed predicates (\`and\` / \`eq\` / \`lt\` / \`ne\`)
which correctly bind the Date params. Same semantics, no more
TypeError at runtime.
Adds a Vitest integration layer that runs against a real Postgres container started via testcontainers. Only `cleanupAuditLogs` is covered for now — it is the first function that benefited from this type of test after a `Date` → `sql\`\`` bug slipped past the unit tests and crashed on the review app. Infrastructure: - `vitest.integration.config.ts` — separate config, `environment: node`, `include: "*.integration.test.ts"`, larger timeouts, serial file execution, no `~/env` mock. - `src/test/integration-setup.ts` — `globalSetup` that boots a Postgres 16 container, runs the real Drizzle migrations against it, and exposes the connection URI via `process.env.DATABASE_URL` before any test file imports `~/server/db`. - `src/test/integration-per-file-setup.ts` — minimal per-file setup that only mocks `server-only` (the Next.js lint package). Everything else (db, env, audit helpers) loads for real. - `vitest.config.ts` — excludes `*.integration.test.ts` from the standard unit run so `pnpm test` stays fast and Docker-free. - `package.json` — new `test:integration` script. - `.claude/hooks/block-bad-patterns.sh` — allow `process.env` in `integration-setup.ts` (same exception as `env.js`, `drizzle.config`, `global-setup.ts`, etc.) since we have to mutate it before `~/env` loads. Test coverage (`src/server/audit/__tests__/cleanup.integration.test.ts`): - empty table → noop - short retention purge (read_sensitive only) - long retention purge (non-read_sensitive only) - mixed retention buckets in a single run - regression guard: the exact `Date` bind param path that crashed the previous revision with `TypeError: The "string" argument must be of type string … Received an instance of Date`. Temporarily reverting `cleanupAuditLogs` to the raw `sql\`\`` template makes all 5 tests fail with that exact TypeError — regression locked in. Cost: container boot + migrations ≈ 5 s, tests ≈ 0.1 s, total ≈ 6 s. Runs only via `pnpm test:integration`, requires Docker.
The audit cleanup token is now required at startup (\`z.string().min(32)\` in env.js) after the #3190 fix. The e2e workflow runs a real \`pnpm build\` without \`SKIP_ENV_VALIDATION\`, so the build crashed on both PRs with: ❌ Invalid environment variables: [ { path: [ 'EGAPRO_AUDIT_CLEANUP_TOKEN' ], message: 'Invalid input: expected string, received undefined' } ] The e2e tests do not exercise \`/api/audit/cleanup\` (it is only called by the K8s CronJob in real deployments), so a hard-coded dummy value that satisfies the 32-char minimum is sufficient. Using a dummy string avoids the friction of adding a new GitHub secret just to unblock CI. \`ci.yaml\` is unaffected — it builds with \`SKIP_ENV_VALIDATION: "1"\` which bypasses all env validation.
revu-bot
left a comment
There was a problem hiding this comment.
⚠️ PR Review Skipped
1 validation issue found. Review thresholds can be adjusted in
.revu.yml.
See why it was skipped and detailed metrics
Issues Found
1. This PR changes 52 files, which exceeds the limit of 25 files.
Suggestion: Consider breaking this PR into smaller, more focused changes. Large PRs are harder to review effectively and may contain unrelated changes.
PR Metrics
- Total files changed: 52
- Reviewable files: 52
- Diff size: 4328 lines
- Documentation files: 0
- Largest file change: 1196 lines
- Addition/Deletion ratio: 30.16
This validation helps ensure the bot focuses on PRs where automated review provides the most value.
Every new tRPC mutation, sensitive query, Route Handler, auth event or cron-triggered action must now ship with its matching audit log entry. Skipping the log is a compliance bug, not a missing enhancement — document the three wire-up points so future PRs (human or agent) don't forget them. - **New rule** `.claude/rules/audit-logging.md`: full playbook scoped via frontmatter paths (`routers/**`, `app/api/**/route.ts`, auth and audit modules) so it auto-loads when touching an audited surface. Covers the 5 surface types (mutation / sensitive query / Route Handler / auth event / system), the 3-step wire-up (`AUDIT_ACTIONS` / `AUDIT_ACTION_CATEGORIES` / path map), metadata sanitisation rules, category→retention mapping, test coverage expectations, and a PR checklist. - `.claude/rules/automation.md`: new **Audit logging** bullet list in the "While writing — inline rules" section with a pointer to the full rule file. - `packages/app/CLAUDE.md`: new **Audit logging (issue #3174)** section between "Forms" and "File size" that summarises the covered surfaces, the 3 wire-up points, the sanitisation contract, and the integration-test requirement for DB-layer changes in `cleanup.ts`. Docs only, no code change.
b1ac75a to
4d1c731
Compare
Wire the audit logging infrastructure (#3190) into every existing action point. After this lands, user actions start being written to `audit.action_log`. - tRPC: plug `auditMiddleware` on `publicProcedure` and `protectedProcedure` — every mutation is logged automatically; a static path-based map adds explicit sensitive-query coverage (`profile.get`, `declaration.getOrCreate`). - NextAuth `events.signIn` → `auth.login` (success). - NextAuth `logger.error` → `auth.login_failed` with an OAuth-safe whitelist (avoids leaking tokens / state / code_verifier into the audit table). - Custom `/api/auth/logout` route → captures user identity *before* invalidating the session cookie. - 9 Next.js Route Handlers wrapped with `withAuditedRoute`: * read_sensitive (180d retention): declaration-pdf, transmitted-pdf, no-sanction-pdf * mutation: upload * export: export/download, export/generate, v1/export/declarations, v1/files * system: gip-mds/import - `/api/v1/files`: validate the `siren` query param via `parseSiren` before storing it (defends the audit `siren` column from arbitrary varchar injection). Closes #3191
Four fixes from the revu-bot review on #3193: - **auth/logout/route.ts** — gate the `auth.logout` audit behind a token presence check so unauthenticated GETs no longer produce spurious rows. Also switch the `void logAction(...)` to `await` so the write is guaranteed to be persisted before the redirect is issued (logAction is fail-safe internally, a DB outage still won't block logout) (CRITICAL + IMPORTANT). - **auth/config.ts** — convert the `logger.error` callback from `async` to synchronous + detached IIFE. NextAuth v4's logger interface is sync; returning a promise risks the side-effect being silently dropped in serverless or cold-start contexts (IMPORTANT). - **audit/cachedAuth.ts (new)** — per-request memoisation of `auth()` via `WeakMap<Request, Promise<Session>>`, so routes that need the session in both `resolveContext` and the handler body only parse the JWT once. Applied to declaration-pdf, transmitted-pdf, no-sanction-pdf, and upload routes (IMPORTANT). Adds: - Two new logout-route tests (no audit on null token, audit with identity on valid session). - Full cachedAuth unit coverage (single-call dedup, multi-request isolation, concurrent in-flight promise sharing).
…o split/feat/3174-user-actions-log/02-instrumentation
|
🎉 Deployment for commit a112ea1 : IngressesDocker images
|
Summary
Branche l'infrastructure d'audit logging livrée par #3192 (#3190) à tous les points d'action existants. Une fois cette PR mergée, les actions utilisateurs commencent à s'écrire en base.
split/feat/3174-user-actions-log/01-infrastructure. Une fois #3192 mergé suralpha, GitHub re-targettera automatiquement cette PR suralpha.tRPC pipeline
auditMiddlewaresurpublicProcedureetprotectedProcedure→ log automatique de toutes les mutationsprofile.get,declaration.getOrCreate(renvoie GIP MDS)Auth (NextAuth)
signIn→auth.login(success)logger.error→auth.login_failedavec whitelist OAuth-safe (évite la fuite de tokens / state / code_verifier)/api/auth/logout→ capture identité avant invalidation du cookie9 routes wrappées avec
withAuditedRoutedeclaration-pdf,transmitted-pdf,no-sanction-pdfuploadexport/download,export/generate,v1/export/declarations,v1/filesgip-mds/importSécurité
/api/v1/files: validation dusirenquery param viaparseSirenavant écriture en base (évite l'injection de varchar arbitraire dans la colonnesiren)Closes #3191
Lié à #3174 (parent) et #3192 (infrastructure, à merger en premier).
Quality gates
Test plan (sur env de review, après merge de #3192)
test@fia1.fr) et vérifier qu'une ligneauth.loginest écritedeclaration.submitest écrite avec les bonnes métadonnéespdf.declaration_downloadest écrite (catégorieread_sensitive)auth.logoutest écriteauth.login_failedSELECT category, action, count(*) FROM audit.action_log GROUP BY category, actionpour valider les catégories🤖 Generated with Claude Code