Data & Domain Review
Comprehensive review from 4 expert perspectives: Legal Domain Model, YAML Schema Design, Database & Query Patterns, and Test Coverage & BDD Gaps.
🔴 Critical / High Severity
Legal model: Temporal gaps
- No
valid_until / expiry: Laws never expire — a repealed law from 2026 is still returned for 2027 queries. Add valid_until field and filter in select_version_for_date.
- No retroactive effect (
terugwerkende kracht): Can't model laws that apply to facts before their valid_from. Add optional retroactive_from field.
- No transitional provisions (
overgangsbepalingen): Can't express "apply old rates to claims submitted before date X" — requires per-law version pins or article-level transition conditions.
- Impact: Engine is not safe for historical recalculation scenarios, which are routine in Dutch administrative practice.
Legal model: Missing operations
Schema: 4 engine operations missing from JSON schema
NOT_EQUALS, IS_NULL, NOT_NULL, NOT_IN — accepted by engine but fail schema validation (schema/v0.5.1/schema.json)
- YAML files using these operations pass the engine but fail
just validate
Schema: Version drift — corpus pinned to v0.5.0, engine ships v0.5.1
- All corpus files except
test_untranslatables reference v0.5.0; schema/latest symlink is stale
untranslatables field (v0.5.1-only) can't be validated against declared schema
Database: law_entries.status can desync from actual job states
- Pre-job status updates (
worker.rs:155-161) are fire-and-forget with if let Err(e) — errors only logged
- Post-success updates (
reset_fail_count, upsert_law) are outside the completion transaction
- Fix: Combine status update into
claim_job CTE; put all post-success updates in one transaction
Test coverage: Major gaps in critical paths
IsNull/NotNull operations: Zero unit tests — null coercion bugs are silent eligibility errors
NotEquals/NotIn operations: Zero unit tests
- Hook/override resolution: BDD-only, no unit-level tests for RFC-007 features
ExecutionReceipt: Zero tests — audit artifact regressions invisible
- Worker process (
worker.rs, 974 lines): Zero tests for job processing logic
- Error golden tests: Only 2 of 15+ error variants covered
- Cross-law golden tests: Only 2 cases (simple + parameter flow-through), no multi-hop chains
- Enrich pipeline: LLM interaction logic entirely untested at integration level
🟡 Medium Severity
Legal model gaps
- Scope limited to
gemeente_code: Provinces and water boards lack first-class support despite RegulatoryLayer variants existing. matches_scope only checks gemeente_code (resolver.rs:292).
- Hook ordering non-deterministic: Same-priority hooks fire in HashMap insertion order. Sort by
(layer_rank, valid_from) for auditability.
- Override validation: No load-time check that override target
(law_id, article, output) exists. Cross-law load order not enforced.
#datum_inwerkingtreding unresolved: valid_from with # reference causes priority comparison to fail (priority.rs:9-15 TODO).
- No model for discretionary powers:
untranslatable_type should distinguish "engine limitation" from "inherently discretionary".
- Co-delegation: Multiple implementing regulations can't combine additively — only winner-takes-all.
Schema design issues
- Top-level missing
additionalProperties: false: Typos in field names pass silently. $id, $schema, uuid undeclared.
definitions map allows any structure: No constraint on value key pattern.
source: {} semantically ambiguous: Three modes (registry, same-law, cross-law) indistinguishable by schema. Use discriminated oneOf.
action allows empty body: Only output required, no value/operation — produces nothing at runtime.
machineReadableSection.requires shape mismatch: Schema defines objects, Rust expects Vec<String>.
- Schema
$id uses mutable main branch URL: Inconsistent with tag-based corpus convention (RFC-013).
unit enum missing percentage: Engine accepts it, schema rejects it.
Database issues
- No index on
law_entries.status: Admin API filters/counts by status heavily — full sequential scan.
- No index on
jobs.started_at WHERE status='processing': Orphan reaper scans full table every poll cycle.
- No index on
jobs.completed_at WHERE status='failed': Metrics queries degrade as jobs table grows.
- Single
enrich_job_id column: Can't track multiple providers. Replace with junction table.
- Harvest job TOCTOU race:
create_harvest_job_if_not_exists has documented race condition — no unique constraint (unlike enrich jobs which use ON CONFLICT DO NOTHING).
- Reaper runs every poll cycle without throttling: Hundreds of times/second during job bursts. Add
last_reap: Instant throttle.
- List endpoint fetches full JSONB:
payload, result, progress returned for all rows — read amplification.
Test gaps (medium)
- BDD: Participatiewet Art 11 (nationality/means test) — primary eligibility gates untested
- Priority: lex specialis — not unit-tested (only BDD)
filter_parameters_for_article and resolve_inputs_with_service — complex hot-path helpers with no direct tests
- Harvester YAML writer — zero tests for serialization to corpus format
- Corpus validation — only 5 test cases, missing: unknown regulatory layer, malformed
$id, invalid dates
- No property-based tests: Evaluation determinism, arithmetic commutativity untested
🟢 Low Severity
Legal model
- Single
produces per article — can't model conditional legal character (RFC-001 open question)
- Subdelegation chain validation: AMvB can't grant wider delegation than wet gave it — not enforced
- Self-reference via
implements index not detected at load time
Schema
legalBasis has no required fields — empty annotations schema-valid
open_terms[].type enum narrower than baseField.type — object/array excluded
valid_from: #reference syntax undocumented with no resolution semantics
procedure.stages have no transition model or name constraints
- Three regulatory layers (
KONINKLIJK_BESLUIT, BELEIDSREGEL, UITVOERINGSBELEID) have no ID constraints
Database
- Count + data as two non-transactional queries (inconsistent pagination snapshots)
claim_job correlated subquery vs CTE pattern (minor planner concern)
- Unbounded
list_jobs/list_laws internal helpers (no LIMIT)
hashtext() advisory lock uses 32-bit hash — collision risk (theoretical)
update_progress redundant updated_at SET + no status = 'processing' guard
- Migration 0007 uses
ALTER TYPE requiring ACCESS EXCLUSIVE lock
- Migrations 0004/0005 are data-mutating DML with hardcoded paths
✅ Positive Findings
Legal model
- IoC delegation model (RFC-003), hook mechanism (RFC-007), lex specialis override, and priority resolution are well-reasoned and align with Dutch legal structure
- Untranslatables framework (RFC-012) correctly identifies encoding gaps
- Cross-law cycle detection via
ResolutionContext.visited is correct for stateless execution
Testing
- Engine operations have strong unit coverage (~83 tests in
operations.rs)
- BDD suite covers 53 scenarios across 8 feature files including multi-law chains
- Golden fixture tests provide regression protection for core execution paths
- Pipeline integration tests properly use testcontainers for DB-backed testing
Database
- Job queue uses
FOR UPDATE SKIP LOCKED correctly for concurrent claiming
- Advisory locks protect against concurrent harvests of the same law
- Enrich jobs use proper partial unique index with
ON CONFLICT DO NOTHING
Data & Domain Review
Comprehensive review from 4 expert perspectives: Legal Domain Model, YAML Schema Design, Database & Query Patterns, and Test Coverage & BDD Gaps.
🔴 Critical / High Severity
Legal model: Temporal gaps
valid_until/ expiry: Laws never expire — a repealed law from 2026 is still returned for 2027 queries. Addvalid_untilfield and filter inselect_version_for_date.terugwerkende kracht): Can't model laws that apply to facts before theirvalid_from. Add optionalretroactive_fromfield.overgangsbepalingen): Can't express "apply old rates to claims submitted before date X" — requires per-law version pins or article-level transition conditions.Legal model: Missing operations
Schema: 4 engine operations missing from JSON schema
NOT_EQUALS,IS_NULL,NOT_NULL,NOT_IN— accepted by engine but fail schema validation (schema/v0.5.1/schema.json)just validateSchema: Version drift — corpus pinned to v0.5.0, engine ships v0.5.1
test_untranslatablesreference v0.5.0;schema/latestsymlink is staleuntranslatablesfield (v0.5.1-only) can't be validated against declared schemaDatabase:
law_entries.statuscan desync from actual job statesworker.rs:155-161) are fire-and-forget withif let Err(e)— errors only loggedreset_fail_count,upsert_law) are outside the completion transactionclaim_jobCTE; put all post-success updates in one transactionTest coverage: Major gaps in critical paths
IsNull/NotNulloperations: Zero unit tests — null coercion bugs are silent eligibility errorsNotEquals/NotInoperations: Zero unit testsExecutionReceipt: Zero tests — audit artifact regressions invisibleworker.rs, 974 lines): Zero tests for job processing logic🟡 Medium Severity
Legal model gaps
gemeente_code: Provinces and water boards lack first-class support despiteRegulatoryLayervariants existing.matches_scopeonly checksgemeente_code(resolver.rs:292).(layer_rank, valid_from)for auditability.(law_id, article, output)exists. Cross-law load order not enforced.#datum_inwerkingtredingunresolved:valid_fromwith#reference causes priority comparison to fail (priority.rs:9-15TODO).untranslatable_typeshould distinguish "engine limitation" from "inherently discretionary".Schema design issues
additionalProperties: false: Typos in field names pass silently.$id,$schema,uuidundeclared.definitionsmap allows any structure: No constraint onvaluekey pattern.source: {}semantically ambiguous: Three modes (registry, same-law, cross-law) indistinguishable by schema. Use discriminatedoneOf.actionallows empty body: Onlyoutputrequired, novalue/operation— produces nothing at runtime.machineReadableSection.requiresshape mismatch: Schema defines objects, Rust expectsVec<String>.$iduses mutablemainbranch URL: Inconsistent with tag-based corpus convention (RFC-013).unitenum missingpercentage: Engine accepts it, schema rejects it.Database issues
law_entries.status: Admin API filters/counts by status heavily — full sequential scan.jobs.started_at WHERE status='processing': Orphan reaper scans full table every poll cycle.jobs.completed_at WHERE status='failed': Metrics queries degrade as jobs table grows.enrich_job_idcolumn: Can't track multiple providers. Replace with junction table.create_harvest_job_if_not_existshas documented race condition — no unique constraint (unlike enrich jobs which useON CONFLICT DO NOTHING).last_reap: Instantthrottle.payload,result,progressreturned for all rows — read amplification.Test gaps (medium)
filter_parameters_for_articleandresolve_inputs_with_service— complex hot-path helpers with no direct tests$id, invalid dates🟢 Low Severity
Legal model
producesper article — can't model conditional legal character (RFC-001 open question)implementsindex not detected at load timeSchema
legalBasishas no required fields — empty annotations schema-validopen_terms[].typeenum narrower thanbaseField.type—object/arrayexcludedvalid_from: #referencesyntax undocumented with no resolution semanticsprocedure.stageshave no transition model or name constraintsKONINKLIJK_BESLUIT,BELEIDSREGEL,UITVOERINGSBELEID) have no ID constraintsDatabase
claim_jobcorrelated subquery vs CTE pattern (minor planner concern)list_jobs/list_lawsinternal helpers (no LIMIT)hashtext()advisory lock uses 32-bit hash — collision risk (theoretical)update_progressredundantupdated_atSET + nostatus = 'processing'guardALTER TYPErequiring ACCESS EXCLUSIVE lock✅ Positive Findings
Legal model
ResolutionContext.visitedis correct for stateless executionTesting
operations.rs)Database
FOR UPDATE SKIP LOCKEDcorrectly for concurrent claimingON CONFLICT DO NOTHING