Design Decisions

The decisions that shaped Aushang, with the reasoning straight from the code and docs. Honest about trade-offs and the demo-vs-production line.

Redact the text, not the image


Decision	Run PII redaction on the OCR'd text before the LLM call; store the original (unblurred) photo as the member image.
Why	A notice board is public — the same photo hangs on the wall. Its text isn't sensitive to the org's own members. The real privacy risks are (1) shipping raw PII to an external LLM and (2) uncontrolled access to the raw original. Both are handled directly.
Earlier plan	Phase 1/2 docs described blurring redacted regions in the image (word boxes are still captured in `ocr.py` for this). Real-world testing showed blurred Rückblick photos were too poor to be useful → dropped the blur, locked down access instead.
Trade-off	A member with consent can see real children's faces on a Rückblick — so reflection originals are deleted at publish and the clear-photo path is force-blocked for them.

Fail-closed redaction, tuned not to mangle


Decision	Mask anything ≥ threshold; over-mask rather than under-mask. But raise per-entity floors and exclude `LOCATION`.
Why	"Over-masking costs one tap in review; under-masking leaks PII." But a board is full of dates, town names, and festival headings; naive ML masking mangled real notices. So deterministic high-signal PII (phone/email/IBAN/birthdate) is caught by a regex pack at confidence 1.0, while fuzzy spaCy guesses (`PERSON` ≥0.6, ML `PHONE_NUMBER` ≥0.85) are held to higher bars and `LOCATION` is dropped entirely.
Backstop	The admin reviews every draft — the human is the final redaction check.

LLM advises, deterministic code decides


Decision	Two columns: `content_type_suggested` (LLM) and `content_type` (admin-confirmed). Routing reads only the confirmed value, which is nullable with no default.
Why	A wrong LLM guess must never auto-route or auto-create a calendar event. `NULL` ("unconfirmed") is deliberately distinct from `info` (a real fallback). Nothing reaches a member without an explicit admin tap.
Bonus	Raw LLM output stays immutable in `posts.extraction`; admin edits live in `post_details`, so edits survive without re-running the LLM.

Multi-provider LLM, Claude by default


Decision	The worker can call Anthropic / Mistral / OpenAI / Gemini, selected by `LLM_PROVIDER`; all return the same validated `ExtractionEnvelope`. Maintainer's deployment uses Claude (`claude-haiku-4-5`).
Why	Provider choice doesn't change the privacy model (only redacted text is ever sent) but it does change data residency: Anthropic/OpenAI are US; Mistral (La Plateforme) is EU. Self-hosters pick a provider + bring their own key. Set `LLM_PROVIDER=mistral` for strict EU residency — nothing else in the pipeline changes.
Honest note	The default (Claude, US) is disclosed on `/datenschutz`; moving extraction into the EU is a tracked follow-up.
Engineering	Each provider's structured-output mechanism differs: Anthropic `output_config`/`json_schema`, OpenAI `json_schema` strict, Gemini `responseSchema` (strips `additionalProperties` + array types), Mistral `json_object` with the schema embedded in the prompt. Strict mode forbids `oneOf`, so all five typed sub-payloads ride as nullable siblings under `details`; `extract()` collapses the matching branch. (Anthropic caps schemas at 16 nullable params, so only genuinely date-semantic fields stay nullable.)

Operator-provisioned three-role model


Decision	`superadmin` / `admin` / `member`; no public signup, no self-service join. (Supersedes the brief's original two-role, self-service design.)
Why	Cleaner trust model: the operator creates orgs and the first admin; admins manage their own members. Removed an entire class of onboarding-link attacks (the Phase 1 review found editable magic-link intent was a privilege-escalation vector).
Cost	The old self-service tables (`invites`/`join_requests`/`pending_onboarding`) were deleted in `0005`; `delete_user_account` (which could orphan an org) was dropped in `0007`.

Security at four independent layers

Middleware + route guards + security-definer RPCs + RLS/column-grants. Any single layer failing doesn't breach the system. Documented as "the security model is the architecture" — touching one layer requires calling it out in the PR.

Column-REVOKE for PII (not just a view)


Decision	`REVOKE SELECT` on `posts` from `authenticated`, re-`GRANT` only non-PII columns.
Why	RLS gates rows, not columns; a member could read PII columns from the base table, bypassing the `posts_public` view (the Phase 1 critical finding). REVOKE makes admin PII access server-only by construction — even an admin's browser client can't read PII.

Double-gated, default-off photo consent

Member opt-in AND admin per-post release, evaluated server-side, delivered only via a short-TTL signed URL. Both defaults false ⇒ zero-backfill safety. Per-viewer "see the real photo" was deliberately rejected for reflections (the "multiple children" problem — see docs/COVER_IMAGES_SPEC.md).

Native Android via Capacitor remote-URL shell


Decision	A remote-URL native shell (loads the server-rendered app) + native camera (`@capacitor/camera`) feeding the same redaction pipeline.
Why	The server-rendered app keeps its full security model unchanged; the native layer adds only camera + launcher icons + a cloud AAB build. iOS later from the same project (needs a Mac).

AI cover illustrations — built but dormant, fail-open

Decorative covers generated from the redacted extraction (no PII, no people, object-only scenes keyed to content_type not the notice's specifics, so a cheerful image can't land next to an illness notice). Same privacy boundary as the text call. Inert until an EU FLUX.1 [schnell] endpoint is configured; fail-open — a missing cover never fails a post.

Honest demo-vs-production lines

Area	Status
Worker transport	Reachable over VPS HTTP today; TLS front (Caddy/Traefik + `worker.` subdomain) is a follow-up.
DB types	`database.types.ts` is a hand-authored stub, not generated.
CSP	Report-only, not enforcing.
Rate limiting	Login/recovery relies on Supabase built-ins; app-level token bucket is a hardening item (QR apply has its own per-code DB limit).
`npm audit` postcss	A transitive advisory inside Next.js's tree; the "fix" downgrades Next to 9.x and is intentionally not applied.
LLM residency	Default extraction is US (Claude); EU move is tracked, disclosed on `/datenschutz`.
Covers	Built, dormant.

Aushang — Privacy-by-construction notice-board digitization · Repository · Built by Eugen Müller

Aushang Wiki

Overview

Home

Deep dives

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Decisions

Design Decisions

Redact the text, not the image

Fail-closed redaction, tuned not to mangle

LLM advises, deterministic code decides

Multi-provider LLM, Claude by default

Operator-provisioned three-role model

Security at four independent layers

Column-REVOKE for PII (not just a view)

Double-gated, default-off photo consent

Native Android via Capacitor remote-URL shell

AI cover illustrations — built but dormant, fail-open

Honest demo-vs-production lines

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Aushang Wiki

Clone this wiki locally