Skip to content

Review: DX, Compliance, Scalability & Open Source Readiness #482

@ehotting

Description

@ehotting

DX, Compliance, Scalability & Open Source Readiness Review

Comprehensive review from 4 expert perspectives: Developer Experience, Dutch Government Compliance, Scalability & Load Readiness, and Open Source Community Readiness.


🔴 Critical / High Severity

Compliance: DPIA and verwerkingsregister missing

  • System processes BSN and personal income/asset data as execution parameters
  • Execution receipts store BSN in parameters field — durable artifact with special-category identifier
  • No DPIA, no privacy statement, no verwerkingsregister entry
  • Action: Commission DPIA before production deployment; create verwerkingenregister entry; determine processing ground (likely Art. 6(1)(e))

Compliance: EU AI Act classification not documented

  • Annex III item 8 lists public authority entitlement systems as high-risk
  • regelrecht directly computes entitlements (zorgtoeslag, bijstand) for authoritative decisions
  • Arts. 9-15 mandatory provisions (risk management, data governance, technical docs, logging, transparency, human oversight, accuracy) — only Art. 12 logging addressed (RFC-013)
  • Action: Produce EU AI Act conformity assessment; assign owners for each obligation; clarify human oversight requirement before BESCHIKKING is legally binding

Compliance: No Algoritmeregister entry

  • System automates BESCHIKKING affecting citizens' entitlements
  • No publiccode.yml, no algorithm register metadata anywhere
  • Action: Create Algoritmeregister entry; add publiccode.yml

Compliance: Admin panel runs without auth by default

  • BIO control 9.4.1 violation — OIDC_CLIENT_ID unset = fully open admin panel with only a warn! log
  • /metrics and /api/info bypass auth middleware on public-facing URL
  • No RBAC — all authenticated users can delete all jobs
  • Action: Make auth mandatory in production; add role-based authorization (read-only vs admin)

Scalability: Postgres is single point of failure for everything

  • Admin, both workers, session store, job queue, law status, metrics — all depend on one Postgres instance
  • No read replica, no circuit-breaking, no graceful degradation
  • Action: Add read replica for metrics; add connection pool idle_timeout/max_lifetime; add job table archival

Scalability: DB connection pool hardcoded at 5 in admin

  • admin/main.rs:69 — no env-var override; 5 connections under concurrent admin use + Prometheus scraping + session cleanup saturates quickly
  • Pipeline pool at least reads DATABASE_MAX_CONNECTIONS env var
  • Action: Add env-var support; document sane defaults (10-20 for admin)

Scalability: Git repo is hard startup dependency

  • ensure_repo() at startup — if GitHub unreachable, worker exits immediately with no retry
  • Action: Wrap in retry loop; allow working from local checkout if remote unavailable

DX: engine.md CLI section shows completely wrong syntax

  • Docs show --param key value positional args; actual binary reads JSON from stdin
  • Action: Replace with correct stdin-based usage example

DX: Pipeline README documents 4 non-existent just recipes

  • just db-up, just db-migrate, just db-down, just pipeline-check — none exist
  • Action: Either add recipes or rewrite README

DX: just dev silently skips frontends without GITHUB_TOKEN

  • Editor (the primary UX) simply doesn't start; yellow warning easily missed
  • just check also requires token for admin-frontend build
  • Action: Make warning prominent (red, boxed); document token in getting-started

Open Source: Private npm package blocks all external contributors

  • @minbzk/storybook requires GitHub PAT with read:packages — external fork builds fail
  • Action: Publish package publicly or document workaround prominently

Open Source: No CONTRIBUTING.md, CODE_OF_CONDUCT, issue/PR templates

  • GitHub community health checklist entirely empty
  • No documented contribution process, no behavioral standards, no enforcement contact
  • Action: Add all standard community files; add issue templates including law-encoding template

🟡 Medium Severity

Compliance gaps

  • BSN in receipts without retention policy — no pseudonymization, no archival procedure (AVG Art. 5)
  • No SECURITY.md — no vulnerability disclosure process, no incident response (BIO 16.1)
  • No accessibility statement (Besluit digitale toegankelijkheid) — required for government websites
  • Inter-engine BSN transmission (RFC-009) without documented legal basis per exchange
  • LLM enrichment sends law text to external AI — no verwerkersovereenkomst documented for Anthropic/VLAM
  • Schema URL integrity — corpus uses mutable refs/heads/main URLs (BIO 12.2, RFC-013 Phase 2 not executed)
  • No Archiefwet retention policy for execution receipts

Scalability gaps

  • Queue throughput: Sequential single-job workers; MAX_HARVEST_DEPTH=1000 allows job explosion; slow burst recovery
  • LLM timeout: 10-minute default, no circuit breaker — throughput collapses if provider down
  • WASM bundle: No wasm-opt, no lazy loading, size unknown
  • No HTTP body size limit on admin API endpoints
  • Filesystem coupling: Workers need per-replica PVC — undocumented
  • No benchmark baseline committed — performance regressions invisible

DX gaps

  • Getting Started omits pre-commit install and WASM tooling
  • No way to run single BDD scenario — must run all
  • No IDE config (.vscode/settings.json, rust-analyzer) — red squiggles on first open
  • No debugging documentation — RUST_LOG, TRACE, trace example not documented together
  • trace example undiscoverable — not in Justfile or docs
  • YAML validation errors don't show the offending key — only "additional properties not allowed"
  • .env.example uses port 5432 but just dev postgres binds to 5433
  • BDD README references non-existent just rust-bdd — correct command is just bdd
  • Editor frontend not in just check quality gate

Open Source gaps

  • No MAINTAINERS file or governance doc — unclear who reviews PRs
  • No "good first issue" labels or curated starter issues
  • No communication channel (GitHub Discussions not enabled)
  • Deploy CI fails on external forks — no if: github.repository == 'MinBZK/regelrecht' guard
  • Language barrier undocumented — corpus requires Dutch, not stated in contribution guide
  • RFC process closed to externals — no guidance for outside RFC proposals
  • README has no screenshots — visitors can't see the product at a glance

🟢 Low Severity

Compliance

  • cargo-deny runs in CI but no documented vulnerability response process
  • Grafana default admin/admin with ALLOW_SIGN_UP=true in production Dockerfile
  • publiccode.yml absent

Scalability

  • DefaultHasher cache collision — no verification before returning cached result
  • RwLock held across sort in list handlers
  • No corpus reload endpoint — requires pod restart
  • N+1 query: one DB call per referenced law in harvest completion
  • list_jobs internal function has no LIMIT

DX

  • just admin comment references DATABASE_SERVER_FULL instead of DATABASE_URL
  • ExternalError hides article/law ID even in local dev contexts

Open Source

  • No SPDX license headers in source files
  • Release process undocumented; no CHANGELOG
  • CLAUDE.md visible to public (harmless but odd)
  • just dev token failure message lacks URL/scope instructions

✅ Positive Findings

Compliance

  • RFC-009 correctly identifies and defers AVG data layer problem rather than ignoring it
  • RFC-013 maps to Awb 3:46, AERIUS, and EU AI Act Art. 12
  • Execution Receipt design is solid foundation for Archiefwet and EU AI Act logging
  • OIDC with PKCE, session cycling, comprehensive security headers — mature security practice
  • EUPL-1.2 license correct per MinBZK policy

Scalability

  • Engine is pure-Rust, single-threaded, allocation-light — inherently fast
  • FOR UPDATE SKIP LOCKED job claiming is correct and efficient
  • Workers are stateless — horizontal scaling is architecturally possible
  • MAX_LOADED_LAWS = 100 provides a sensible memory cap
  • Metrics cache prevents Prometheus from hammering DB

DX

  • justfile is comprehensive with 20+ recipes covering the full workflow
  • just check runs format + lint + build-check + validate + tests in one command
  • Pre-commit hooks enforce quality at commit time
  • Trace example (packages/engine/examples/trace.rs) is excellent — just needs visibility
  • .env.example exists and is mostly complete

Open Source

  • Well-structured VitePress docs site with concepts, components, operations, RFCs
  • Semantic commits enforced via PR title linter
  • CI works for pure-Rust contributions without any secrets
  • Strong RFC process with 14 architectural decisions documented
  • Live deployment URLs visible in README for instant product impression

Key Cross-Cutting Theme

The project is framed as "een verkenning" (exploration) but is deployed in production with live public URLs executing real Dutch law with BSN-linked welfare calculations. The gap between the exploratory framing and the production reality means compliance obligations that would normally be addressed before go-live have not been formally triggered. A production readiness review covering DPIA, EU AI Act conformity, Algoritmeregister, and Archiefwet retention is the single highest-priority action item across all 24 reviews.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions