Review: Data & Domain (legal model, schema, database, test coverage)

## Data & Domain Review

Comprehensive review from 4 expert perspectives: Legal Domain Model, YAML Schema Design, Database & Query Patterns, and Test Coverage & BDD Gaps.

---

## 🔴 Critical / High Severity

### Legal model: Temporal gaps
- **No `valid_until` / expiry**: Laws never expire — a repealed law from 2026 is still returned for 2027 queries. Add `valid_until` field and filter in `select_version_for_date`.
- **No retroactive effect (`terugwerkende kracht`)**: Can't model laws that apply to facts before their `valid_from`. Add optional `retroactive_from` field.
- **No transitional provisions (`overgangsbepalingen`)**: Can't express "apply old rates to claims submitted before date X" — requires per-law version pins or article-level transition conditions.
- **Impact**: Engine is not safe for historical recalculation scenarios, which are routine in Dutch administrative practice.

### Legal model: Missing operations
- **No ROUND/CEIL/FLOOR**: Rounding rules have material fiscal impact across millions of recipients. Known gap (RFC-012).
- **Partner combined income not implementable**: Zorgtoeslag art. 2 lid 2 requires conditional input resolution — engine can't express "resolve this input only if condition X holds" (tracked as #377).

### Schema: 4 engine operations missing from JSON schema
- `NOT_EQUALS`, `IS_NULL`, `NOT_NULL`, `NOT_IN` — accepted by engine but fail schema validation (`schema/v0.5.1/schema.json`)
- YAML files using these operations pass the engine but fail `just validate`

### Schema: Version drift — corpus pinned to v0.5.0, engine ships v0.5.1
- All corpus files except `test_untranslatables` reference v0.5.0; `schema/latest` symlink is stale
- `untranslatables` field (v0.5.1-only) can't be validated against declared schema

### Database: `law_entries.status` can desync from actual job states
- Pre-job status updates (`worker.rs:155-161`) are fire-and-forget with `if let Err(e)` — errors only logged
- Post-success updates (`reset_fail_count`, `upsert_law`) are outside the completion transaction
- **Fix**: Combine status update into `claim_job` CTE; put all post-success updates in one transaction

### Test coverage: Major gaps in critical paths
- **`IsNull`/`NotNull` operations**: Zero unit tests — null coercion bugs are silent eligibility errors
- **`NotEquals`/`NotIn` operations**: Zero unit tests
- **Hook/override resolution**: BDD-only, no unit-level tests for RFC-007 features
- **`ExecutionReceipt`**: Zero tests — audit artifact regressions invisible
- **Worker process** (`worker.rs`, 974 lines): Zero tests for job processing logic
- **Error golden tests**: Only 2 of 15+ error variants covered
- **Cross-law golden tests**: Only 2 cases (simple + parameter flow-through), no multi-hop chains
- **Enrich pipeline**: LLM interaction logic entirely untested at integration level

---

## 🟡 Medium Severity

### Legal model gaps
- **Scope limited to `gemeente_code`**: Provinces and water boards lack first-class support despite `RegulatoryLayer` variants existing. `matches_scope` only checks `gemeente_code` (`resolver.rs:292`).
- **Hook ordering non-deterministic**: Same-priority hooks fire in HashMap insertion order. Sort by `(layer_rank, valid_from)` for auditability.
- **Override validation**: No load-time check that override target `(law_id, article, output)` exists. Cross-law load order not enforced.
- **`#datum_inwerkingtreding` unresolved**: `valid_from` with `#` reference causes priority comparison to fail (`priority.rs:9-15` TODO).
- **No model for discretionary powers**: `untranslatable_type` should distinguish "engine limitation" from "inherently discretionary".
- **Co-delegation**: Multiple implementing regulations can't combine additively — only winner-takes-all.

### Schema design issues
- **Top-level missing `additionalProperties: false`**: Typos in field names pass silently. `$id`, `$schema`, `uuid` undeclared.
- **`definitions` map allows any structure**: No constraint on `value` key pattern.
- **`source: {}` semantically ambiguous**: Three modes (registry, same-law, cross-law) indistinguishable by schema. Use discriminated `oneOf`.
- **`action` allows empty body**: Only `output` required, no `value`/`operation` — produces nothing at runtime.
- **`machineReadableSection.requires` shape mismatch**: Schema defines objects, Rust expects `Vec<String>`.
- **Schema `$id` uses mutable `main` branch URL**: Inconsistent with tag-based corpus convention (RFC-013).
- **`unit` enum missing `percentage`**: Engine accepts it, schema rejects it.

### Database issues
- **No index on `law_entries.status`**: Admin API filters/counts by status heavily — full sequential scan.
- **No index on `jobs.started_at WHERE status='processing'`**: Orphan reaper scans full table every poll cycle.
- **No index on `jobs.completed_at WHERE status='failed'`**: Metrics queries degrade as jobs table grows.
- **Single `enrich_job_id` column**: Can't track multiple providers. Replace with junction table.
- **Harvest job TOCTOU race**: `create_harvest_job_if_not_exists` has documented race condition — no unique constraint (unlike enrich jobs which use `ON CONFLICT DO NOTHING`).
- **Reaper runs every poll cycle without throttling**: Hundreds of times/second during job bursts. Add `last_reap: Instant` throttle.
- **List endpoint fetches full JSONB**: `payload`, `result`, `progress` returned for all rows — read amplification.

### Test gaps (medium)
- **BDD: Participatiewet Art 11** (nationality/means test) — primary eligibility gates untested
- **Priority: lex specialis** — not unit-tested (only BDD)
- **`filter_parameters_for_article`** and **`resolve_inputs_with_service`** — complex hot-path helpers with no direct tests
- **Harvester YAML writer** — zero tests for serialization to corpus format
- **Corpus validation** — only 5 test cases, missing: unknown regulatory layer, malformed `$id`, invalid dates
- **No property-based tests**: Evaluation determinism, arithmetic commutativity untested

---

## 🟢 Low Severity

### Legal model
- Single `produces` per article — can't model conditional legal character (RFC-001 open question)
- Subdelegation chain validation: AMvB can't grant wider delegation than wet gave it — not enforced
- Self-reference via `implements` index not detected at load time

### Schema
- `legalBasis` has no required fields — empty annotations schema-valid
- `open_terms[].type` enum narrower than `baseField.type` — `object`/`array` excluded
- `valid_from: #reference` syntax undocumented with no resolution semantics
- `procedure.stages` have no transition model or name constraints
- Three regulatory layers (`KONINKLIJK_BESLUIT`, `BELEIDSREGEL`, `UITVOERINGSBELEID`) have no ID constraints

### Database
- Count + data as two non-transactional queries (inconsistent pagination snapshots)
- `claim_job` correlated subquery vs CTE pattern (minor planner concern)
- Unbounded `list_jobs`/`list_laws` internal helpers (no LIMIT)
- `hashtext()` advisory lock uses 32-bit hash — collision risk (theoretical)
- `update_progress` redundant `updated_at` SET + no `status = 'processing'` guard
- Migration 0007 uses `ALTER TYPE` requiring ACCESS EXCLUSIVE lock
- Migrations 0004/0005 are data-mutating DML with hardcoded paths

---

## ✅ Positive Findings

### Legal model
- IoC delegation model (RFC-003), hook mechanism (RFC-007), lex specialis override, and priority resolution are well-reasoned and align with Dutch legal structure
- Untranslatables framework (RFC-012) correctly identifies encoding gaps
- Cross-law cycle detection via `ResolutionContext.visited` is correct for stateless execution

### Testing
- Engine operations have strong unit coverage (~83 tests in `operations.rs`)
- BDD suite covers 53 scenarios across 8 feature files including multi-law chains
- Golden fixture tests provide regression protection for core execution paths
- Pipeline integration tests properly use testcontainers for DB-backed testing

### Database
- Job queue uses `FOR UPDATE SKIP LOCKED` correctly for concurrent claiming
- Advisory locks protect against concurrent harvests of the same law
- Enrich jobs use proper partial unique index with `ON CONFLICT DO NOTHING`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review: Data & Domain (legal model, schema, database, test coverage) #477

Data & Domain Review

🔴 Critical / High Severity

Legal model: Temporal gaps

Legal model: Missing operations

Schema: 4 engine operations missing from JSON schema

Schema: Version drift — corpus pinned to v0.5.0, engine ships v0.5.1

Database: `law_entries.status` can desync from actual job states

Test coverage: Major gaps in critical paths

🟡 Medium Severity

Legal model gaps

Schema design issues

Database issues

Test gaps (medium)

🟢 Low Severity

Legal model

Schema

Database

✅ Positive Findings

Legal model

Testing

Database

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Review: Data & Domain (legal model, schema, database, test coverage) #477

Description

Data & Domain Review

🔴 Critical / High Severity

Legal model: Temporal gaps

Legal model: Missing operations

Schema: 4 engine operations missing from JSON schema

Schema: Version drift — corpus pinned to v0.5.0, engine ships v0.5.1

Database: law_entries.status can desync from actual job states

Test coverage: Major gaps in critical paths

🟡 Medium Severity

Legal model gaps

Schema design issues

Database issues

Test gaps (medium)

🟢 Low Severity

Legal model

Schema

Database

✅ Positive Findings

Legal model

Testing

Database

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Database: `law_entries.status` can desync from actual job states