Dwing CSRF-bescherming centraal af voor alle web-POSTs#71
Conversation
|
Twee commits toegevoegd ( 1. Blokker: invite-registratie ging 403'en (
|
|
Update — full rewrite + e2e Playwright-tests gepushed Na reflectie op de eerste augmentatie hebben we de aanpak fundamenteel anders gemaakt. Twee nieuwe commits: Wat veranderd is t.o.v. de oorspronkelijke PR-aanpak
Backward-compatibele Waarom dit ipv csrf.js
Het werd nogal robuuster dan een shared JS-include. Tests (4 lagen)
Pre-existing failures in bredere e2e (niet door deze PR)Geverifieerd op zowel deze PR's baseline (zonder onze rewrite) als met rewrite: 17 e2e tests falen identiek in beide gevallen ( Procesonderdeel: de pre-push hook draait Open follow-ups
Audit: 4 commits direct op deze PR-branch gepushed. Tests (unit + e2e Playwright) groen, geen regressies tegenover PR-baseline. |
CSRF-validatie was opt-in: alleen de POST domain-settings riep token- en origin-controle aan. Alle andere state-changing web-routes (project/ deployment/component verwijderen, refresh, tune, tools encrypt/decrypt, wizard- en modal-wizard-stappen) leunden uitsluitend op de SSO-sessie- cookie. Omdat het platform multi-tenant is op zustersubdomeinen biedt SameSite=Lax geen bescherming tegen een same-site tenant-app. CSRFMiddleware controleert nu elke unsafe methode (POST/PUT/PATCH/DELETE) op niet-/api/-routes: double-submit token (cookie + X-CSRF-Token header of csrf_token formulierveld) plus Origin/Referer tegen de host. De token- cookie is JS-leesbaar gemaakt zodat htmx/fetch de token automatisch meesturen via static/js/csrf.js (htmx, fetch en klassieke formulieren). De per-handler _validate_csrf-aanroep blijft idempotent. De sessie-cookie krijgt https_only in productie. SameSite blijft Lax (niet Strict): de OIDC-callback komt via een cross-site top-level redirect van Keycloak terug op /auth/callback en authlib heeft daar zijn state-cookie nodig; Strict zou login breken. De CSRF-middleware is de echte verdediging.
De CSRFMiddleware-module faalde bij import: dispatch() is geannoteerd met Callable/Awaitable/Response terwijl die namen alleen onder TYPE_CHECKING stonden en het bestand geen "from __future__ import annotations" had. Daardoor werden de annotaties direct geevalueerd en crashte de hele applicatie bij opstarten (server.py importeert CSRFMiddleware in create_app) met NameError. "from __future__ import annotations" toegevoegd zodat annotaties lazy strings zijn; de TYPE_CHECKING-imports blijven correct. De pagina /tools rendert <c-page> direct zonder base.html.j2, waardoor csrf.js niet werd geladen. De fetch-POSTs naar /tools/encrypt en /tools/decrypt kregen daardoor geen CSRF-token en gaven voor elke legitieme gebruiker 403. csrf.js nu via additionalJs toegevoegd. Test uitgebreid met een regressiecheck op de module-import, de Referer-fallback (foreign en matching) en de echte GET-seed-dan-POST flow.
invite-register.html.j2 renders c-page directly instead of extending
base.html.j2, so csrf.js was not loaded and the form to POST
/invite/{key}/register had no CSRF token. Under the central
CSRFMiddleware added in this PR, every legitimate invite registration
would get a 403.
Same shape as the bb58cb7 fix for the /tools page: add csrf.js via
additionalJs on the c-page tag. invite-landing, invite-success and
invite-error are GET-only display pages and do not need this.
…es for subtrees The exempt-prefix tuple lumped slash-suffixed prefixes (/api/, /static/, /auth/) with bare names (/health, /metrics, /ready) and called path.startswith on all of them. A user-facing page sharing the bare name would be silently exempted: /metrics-explorer (web/metrics_explorer_router.py is a real HTML page), /healthcheck-admin, /readyness-report. Today none of those exist as POST endpoints, but a future POST under any of those prefixes would silently bypass CSRF. Split into two collections: - CSRF_EXEMPT_PREFIXES: only slash-suffixed prefixes (/api/, /static/, /auth/) - matches directory-style subtrees, never bare-prefix matches. - CSRF_EXEMPT_EXACT: explicit frozenset of probe paths (/health, /healthz, /readyz, /metrics) registered in server.py and prometheus_router.py. Any new probe must be added to CSRF_EXEMPT_EXACT explicitly. Tests added: parametrized exempt-vs-non-exempt for the four probe paths vs three lookalike paths, plus the invite-register POST (NOT exempt - which is the policy this PR's other commit makes the template comply with). Plus two tests for the wizard's htmx + json-enc.js flow: application/json POST with X-CSRF-Token header passes, without token rejected.
…dden field
PR71's first iteration used a shared csrf.js that read the CSRF cookie
from document.cookie and attached it as an X-CSRF-Token header on every
htmx/fetch request, plus a hijacked window.fetch wrapper. To make this
work the cookie had to be httponly=False (JS-readable), and every page
that didn't extend base.html.j2 needed its own csrf.js include.
This commit drops that shared-JS approach. CSRFMiddleware exposes the
per-request token on request.state.csrf_token so templates render it
server-side into the place that needs it: hx-headers attribute on
htmx-triggered elements, a hidden form field on classic forms, or
inline into a fetch() call's headers. JavaScript never reads the token
from a cookie; the cookie is now httponly=True (defense in depth
against XSS).
Concrete changes:
opi/utils/csrf.py
- CSRFMiddleware sets request.state.csrf_token at the start of every
request (existing cookie value or freshly minted)
- Response cookie set with httponly=True
- ensure_csrf_token() kept as backward-compat wrapper that returns
request.state.csrf_token (existing handlers do not need rewriting)
static/js/csrf.js: deleted (102 lines)
templates/base.html.j2 + templates/tools.html.j2: csrf.js include
removed from additionalJs
templates/tools.html.j2: fetch() to /tools/encrypt and /tools/decrypt
now sets X-CSRF-Token header inline from {{ request.state.csrf_token }}
templates/invite-register.html.j2: form-encoded POST gets a hidden
csrf_token field instead of the deleted csrf.js include
templates/wizard/{wizard_step,wizard_steps_indicator,modal_wizard_step}.html.j2:
hx-headers='{"X-CSRF-Token": "{{ request.state.csrf_token }}"}'
on every htmx-triggered element that does an unsafe method (3 forms +
1 step-indicator link). Children inside a form inherit hx-headers from
their parent, so per-button buttons don't need their own. Per-element
rendering survives htmx fragment swaps -- the new fragment arrives with
its own server-rendered token from the same request that set the cookie.
tests/test_csrf_enforcement.py:
- New TestServerRenderedToken class: state token is set, same token
across requests in a session, cookie has httponly=True + SameSite=strict
- Existing 22 tests still pass without changes; the middleware accepts
the same header / form-field paths as before
25 tests pass.
… flow Four end-to-end assertions that prove the rewrite works in an actual browser, not just at the middleware unit level: 1. test_csrf_cookie_is_httponly: cookie set by the server is HttpOnly -- BrowserContext sees it, document.cookie does not. 2. test_wizard_form_renders_hx_headers_with_csrf_token: the wizard <form hx-ext='json-enc'> carries an hx-headers attribute whose X-CSRF-Token value equals the cookie value the browser holds. 3. test_wizard_step_submission_sends_csrf_header: when the form is submitted via htmx the actual XHR request carries an X-CSRF-Token header matching the cookie -- end-to-end proof that hx-headers reaches the network layer. 4. test_post_without_csrf_token_is_rejected: a same-origin fetch from inside the page context that omits both header and form field gets 403 from the middleware -- proves enforcement, not just rendering. Plus test_invite_register_has_hidden_csrf_field for the classic forms path; skipped in the test app because the invite key is not configured in the test fixture. 4 passed + 1 skipped in 6.6s.
b1934c3 to
cb3cccc
Compare
…ed GETs - wizard_review.html.j2 + modal_wizard_review.html.j2: 'Project aanmaken' / confirm buttons lacked hx-headers, causing the final submit to 403. Use " entities since c-button re-renders strip single-quoted attrs. - utils/csrf.py + router_wizard.py + router_detail_edit.py: refuse wizard step GETs that carry form-data-style query params ([..] or / in key). Triggers when htmx fails to attach and the browser falls back to native GET form submit; without the guard the GET silently advances wizard state and bypasses CSRF (GET is in SAFE_METHODS). Issue #109 tracks the separate concern that load_step mutates state.current_step on GET (REST idempotency + same-site nav CSRF).
This doc belongs to a different scope (PR #76 era). Will be re-introduced in its proper PR if needed.
- _render_modal_step / _render_modal_review now take request and put it in the template context; wizard_steps_indicator and modal_wizard_review use request.state.csrf_token so the modal flow crashed with UndefinedError. - dashboard.html.j2: 'Alle 1 project is gezond' was unnatural Dutch for singular; split into 'Het project is gezond' / 'Alle N projecten zijn gezond' branches.
handle_create_project's file_exists check (PR #70) fired on every dispatch of the create_project task, but four callers use it: - router_wizard.py:1944 (real create -- check must fire) - router_detail_edit.py:493 (modal edit -- legitimate overwrite) - router_subdomain_admin.py:360 (subdomain admin) - router.py:498 (component delete) The three update flows hit the check and got 'project bestaat al' on every edit. Fix: payload['is_new_project'] flag, default False, only the wizard create sets True. Existence check + commit message ('Create' vs 'Update') key off it. database_manager: 'Creating database resources' -> 'Database klaarmaken' as a Dutch + neutral label that fits both create and update. Full create/update label split is tracked in issue #112.
… flag PR #71 made handle_create_project's existence check conditional on payload['is_new_project'] == True (so edit/update flows that reuse the create_project task aren't blocked). The existing test from PR #70 still ran without the flag and expected the block to fire -- now silently passes through. Add the flag, and add a complementary test that an edit-flow payload (no flag) DOES overwrite an existing file.
Na de centrale CSRF-bescherming (#71) miste de Herverwerken-knop in het danger-confirm-modal de X-CSRF-Token header. Klikken faalde daardoor met een 403 op /projects/<naam>/refresh.
…loyment/component) executeDangerAction POSTed via fetch() without an X-CSRF-Token header, so CSRFMiddleware rejected it with 403 and the UI showed "Onbekende fout" — project, deployment AND component delete were all broken. Same class as the #71 CSRF rollout gaps; the refresh button right above it already sends the token via hx-headers. Add the header to the fetch. Also add a sweep regression guard: every template fetch() POST/PUT/PATCH/DELETE must carry an X-CSRF-Token header (fails on the pre-fix template, catches the whole recurring class).
…128) * fix(clone): heldere clone-logging — geen valse 'schema not created' / 'failed clone' De pg_dump-clone herstelt een schema altijd onder de BRON-naam (pg_dump kan niet hernoemen), dus bij een generational clone komt het schema binnen als source_schema en hernoemen we het naar target_schema. Dat is het normale pad, geen fout. Toch logde de code ERROR 'Target schema not created' + 'Schemas that DO exist' op elke geslaagde clone, en de generatie-failover noemde een bestaande (vaak geslaagde) generatie 'leftover from failed clone'. Samen lieten ze het lijken alsof clones faalden terwijl ze slaagden. - postgres.py: hernoemen is nu het verwachte pad (INFO); ERROR alleen nog bij een dump die echt geen schema opleverde (lege bron). - database_manager.py: 'already exists (prior generation)' op INFO i.p.v. WARNING 'leftover from failed clone'. Gedrag ongewijzigd: de rename en de failover gebeuren nog steeds; alleen de log-niveaus en bewoording kloppen nu. 60 clone-tests groen. * fix(clone): persist clone-status + generation in process_project (stop re-cloning) process_project mutates project_data in memory -- set_clone_status sets clone-from.status.completed=True and the clone records the database generation -- but committed the project file WITHOUT saving that state first. Every other write flow calls save_project_data() before commit_and_push(); process_project did not. So the updates lived only in memory and were lost: each reconcile re-read completed=false / generation=None, re-ran the clone, and the generational failover created a fresh full database copy every pass (regel-k4c PRs accumulated v1..vN within a single startup reconcile). Not a concurrency/rebase problem -- a missing persist. read_project_file caches the parsed dict, so save_project_data() writes exactly the mutated object. - project_manager.py: save_project_data() before the project-file commit in process_project. - test: AST regression guard that save_project_data() precedes commit_and_push() in process_project (fails on the pre-fix code). * fix(kubectl): log the actual command (and namespace) instead of "Running kubectl command" The connector logged bare "Running kubectl command" / "kubectl command succeeded" DEBUG lines with no command, resource, namespace or result -- useless for debugging the reconcile firehose. Now logs the real argv with secret-bearing flag values masked (--token/--password/--from-literal/--from-file/--docker-password). The "-n <namespace>" in the command identifies the project (rig-prd-<project>), so the lines now answer "what project, what command, what result". Failure and timeout lines include the command too. * fix(kubectl): log only the operation + project, never argv values Follow-up on 3a8e0d8. Replace the redact-the-args approach with a values-free summary: _summarize_kubectl_command logs the subcommand, the resource kind and the target namespace (rig-prd-<project>) and nothing else — no flag values, resource names or stdin. A secret can never reach the log regardless of the command, and the lines read as "operation X on project Y" (e.g. "kubectl get pods (project rig-prd-regel-k4c)"). Resolves the 6 CodeQL py/clear-text-logging-sensitive-data alerts: no argv value reaches the log sink, so there is no taint to flag (the previous allowlist redaction masked values but CodeQL could not recognise it as a sanitizer, and in practice the OPI never put secrets on the kubectl argv anyway — secrets go via SOPS manifests applied with -f/stdin). Test asserts the summary shows verb/kind/project and never the secret value or resource name. * fix(ui): send CSRF token on the danger-zone delete fetch (project/deployment/component) executeDangerAction POSTed via fetch() without an X-CSRF-Token header, so CSRFMiddleware rejected it with 403 and the UI showed "Onbekende fout" — project, deployment AND component delete were all broken. Same class as the #71 CSRF rollout gaps; the refresh button right above it already sends the token via hx-headers. Add the header to the fetch. Also add a sweep regression guard: every template fetch() POST/PUT/PATCH/DELETE must carry an X-CSRF-Token header (fails on the pre-fix template, catches the whole recurring class).
Probleem (pre-existing, onafhankelijk)
CSRF-bescherming was opt-in in plaats van afgedwongen.
CSRFMiddlewarezettealleen de cookie en valideerde nooit. De token- en origin-controle
(
validate_csrf_token,_validate_csrf) werd op precies een plek aangeroepen:de POST
domain-settings. Alle andere state-changing web-routes leundenuitsluitend op de SSO-sessie-cookie:
delete_project_web,delete_deployment_web,delete_component_web,refresh_project_web,tune,tools/encrypt,tools/decryptstep/skip/confirmstep/preset/submitHet platform is multi-tenant op zustersubdomeinen.
SameSite=Laxop desessie-cookie beschermt daardoor niet tegen een kwaadaardige tenant-app op een
broer-subdomein: dat is "same-site". Een tenant-app kon zo namens een
ingelogde beheerder projecten verwijderen of herconfigureren.
Dit is een bestaand lek, los van andere lopende wijzigingen.
Oplossing
CSRFMiddleware.dispatch: elke unsafe methode(POST/PUT/PATCH/DELETE) op een niet-
/api/-route moet zowel een geldigedouble-submit token (cookie +
X-CSRF-Tokenheader ofcsrf_tokenformulierveld) als een Origin/Referer die matcht met de host hebben.
/api/(API-key auth),/static/,/auth/en de probes zijn vrijgesteld.static/js/csrf.jshangt de token automatisch aan htmx-requests (
htmx:configRequest),fetch-aanroepen en klassieke formulier-submits. Zo blijven de bestaandeflows (delete/danger-zone via
fetch, wizard viahx-post, admin-forms)werken zonder per-template wijzigingen.
_validate_csrfblijft bestaan maar is nu idempotent: hetherhaalt dezelfde controle die de middleware al deed. ~90 regels
gedupliceerde origin-logica zijn verplaatst naar
opi/utils/csrf.py.https_onlyin productie.SameSiteblijft bewust Lax,niet Strict: de OIDC-callback komt via een cross-site top-level redirect van
Keycloak terug op
/auth/callbacken authlib heeft daar zijn state-cookienodig. Strict zou de login breken. De CSRF-middleware is de echte
verdediging tegen de same-site tenant-aanvaller.
Testplan
tests/test_csrf_enforcement.pytest de middleware direct (rood-groen):/api/-routes zijn vrijgesteldDe testlogica is geverifieerd via een standalone Starlette-runner (alle
assertions slagen). De volledige pytest-suite kan in deze omgeving niet
collecten door een bestaand, niet-gerelateerd probleem:
pydantic==2.12.5is incompatibel met de Python 3.14.0b4 in de venv (
_eval_type() got an unexpected keyword argument 'prefer_fwd_module'), wat al bij het importerenvan
fastapiinconftest.pyfaalt en de hele suite blokkeert. Dit raaktelke testfile en staat los van deze wijziging.
Handmatige verificatie aanbevolen
SameSite=Laxde callback niet breekt)