docs: RFC-016 FOREACH + RFC-017 Native Data Source Metadata#510
docs: RFC-016 FOREACH + RFC-017 Native Data Source Metadata#510anneschuth wants to merge 5 commits intomainfrom
Conversation
Preview Deployment — harvester-adminYour changes have been deployed to a preview environment: URL: https://harvester-admin-pr510-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl This deployment will be automatically cleaned up when the PR is closed. |
Preview Deployment — harvester-workerYour changes have been deployed to a preview environment: URL: https://harvester-worker-pr510-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl This deployment will be automatically cleaned up when the PR is closed. |
Preview Deployment — editor — editorYour changes have been deployed to a preview environment: URL: https://editor-pr510-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl This deployment will be automatically cleaned up when the PR is closed. |
- Rename subject→collection, value→body, where→filter (avoid overloading existing property names) - Add property naming rationale table - Document variable binding with 'as' as a new concept in the schema - Describe nested FOREACH scoping rules and shadowing - Explain polymorphic ADD covers string concatenation (no separate CONCAT) - Remove references to specific implementation projects - Remove history of FOREACH in previous schema versions - Expand alternatives considered with stronger arguments - Add foreach_string_combine conformance test - Add lowercase pattern constraint on 'as' variable name
Preview Deployment — docs — docsYour changes have been deployed to a preview environment: URL: https://docs-pr510-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl This deployment will be automatically cleaned up when the PR is closed. |
There was a problem hiding this comment.
RFC-016: Collection Operations — Review
Semantics
🟠 Significant — Nested FOREACH scoping: collection expression contradicts scope description
The nested example (lines 120–131) shows collection: $household.members in the inner FOREACH, which requires $household to be visible when that expression is evaluated. But the comment immediately below says:
body: $member.income # inner scope only sees $member, not $householdThe comment implies the inner scope is opaque to outer variables, but $household.members depends on $household being in scope. The semantics section does not clarify when the collection expression is evaluated relative to scope creation.
The RFC needs to explicitly state: collection and filter are evaluated in the calling scope (before the child context is created); only body is evaluated in the child scope. Without this, implementers will disagree — one plausible reading of the semantics section (step 2a: "create child context", then 2b: "bind element", then 2c: "evaluate filter") suggests filter is evaluated in the child scope, but step 2a creates the child before any evaluation happens, which means collection must be pre-evaluated in the outer scope. None of this is stated.
The sentence "To access the outer variable in the inner body, extract it to a separate output first" also conflicts with the example that successfully does access $household from the inner level. This needs to be resolved.
🟠 Significant — ADD empty-collection identity is wrong for non-numeric types
The combine semantics table (line ~175) states:
| ADD | Sum numeric results (polymorphic...) | 0 |
But the RFC explicitly states ADD is polymorphic: strings concatenate, arrays flatten. For an empty string-body collection, the identity for string-ADD is "" (empty string), not 0. Returning 0 would produce a type mismatch whenever the result is used in a string context downstream. Same problem for array-body collections (identity should be [], not 0).
The empty-collection identity should be type-dependent:
- numeric:
0 - string:
"" - array:
[]
Or the RFC should explicitly restrict combine: ADD to numeric body expressions only and introduce CONCAT for strings.
Minor
🟡 Scoping note "unless passed explicitly" is misleading
Line ~115: "Inner iterations cannot see outer iteration variables unless they are passed explicitly."
No mechanism for "passing explicitly" exists in the schema — there is no way to forward a variable into a child scope. The qualifier should be removed; the sentence should read "Inner iterations cannot see outer iteration variables." The example showing $household.members working in the inner collection should be the documented escape hatch (access the outer variable in the collection expression, not in the body).
🟡 Untranslatable propagation does not address filtered elements
Implementation notes state: "if any element produces Value::Untranslatable, the combined result is Value::Untranslatable." This does not address the interaction with filter: if an element would produce Untranslatable but filter skips it, should the untranslatable still propagate? The filter expression itself might also be untranslatable (distinct from the body). Both cases should be specified here to avoid divergent implementations.
- Fix nested scoping: collection evaluated in outer scope, body/filter in inner scope. Made explicit with clear explanation. - Add error handling section: errors propagate immediately, no partial results (legal computations must be complete). - Explain why combine enum excludes SUBTRACT/MULTIPLY/DIVIDE (not associative over collections in legal sense). - Explain empty collection semantics per combine type (identity values vs null). - Motivate 'item' as default for 'as' (neutral, no domain collision). - Clarify filter runs in child scope where element is already bound. - Strengthen MAP/FILTER/REDUCE rejection with linguistic argument (legal text combines selection+transformation+aggregation in single clauses, not separate functional steps). - Add AWB event counting and huurtoeslag household examples. - Add foreach_error conformance test. - Add legal references for huurtoeslag and AWB.
There was a problem hiding this comment.
RFC-016: Collection Operations (FOREACH) — Review
This RFC is well-motivated and the design rationale is sound. The single-operation approach (vs separate MAP/FILTER/REDUCE) is justified, the legal use cases are concrete, and the property naming rationale is clear. The following issues need to be resolved before implementation begins.
🔴 Critical — Nested FOREACH scoping is wrong
File: docs/rfcs/rfc-016.md, lines 141–153 (nested FOREACH example)
The RFC states:
collection: $household.members # evaluated in OUTER scope → can see $household
This is incorrect given the current engine implementation. create_child() in context.rs explicitly starts with an empty local scope:
local: BTreeMap::new(), // Child starts with empty local scope$household is bound as a local variable in the outer FOREACH child context. When the inner FOREACH evaluates its own collection expression, it runs inside the inner child context (which inherits parameters, outputs, resolved_inputs, and definitions — but not locals from the parent). $household is a local, so it is invisible in the inner context. The expression $household.members would resolve to "variable not found" and the FOREACH would error.
The semantics section (step 1) says "Evaluate collection in the current scope" — but "current scope" for the inner FOREACH is the inner child scope, not the outer. The RFC needs to either:
- Define that
collectionis evaluated before creating the child context (in the caller's scope, not a new child), making it consistent with the currentcreate_child()behaviour where locals don't carry over; or - Specify that FOREACH child contexts inherit parent local variables (which contradicts the existing engine behaviour and the explicit design rationale in
context.rs).
The implementation notes must also be updated accordingly — this affects execute_foreach() design.
🟠 Significant — MAX_RECURSION_DEPTH does not exist
File: docs/rfcs/rfc-016.md, lines 249–250
The RFC refers to MAX_RECURSION_DEPTH as an "existing engine configuration value." This constant does not exist. The relevant constants in config.rs are:
MAX_OPERATION_DEPTH(100) — nesting depth for operations during evaluationMAX_RESOLUTION_DEPTH(50) — internal reference resolutionMAX_CROSS_LAW_DEPTH(20) — cross-law call depth
The implementation notes (lines 286–289) say "FOREACH increments depth for recursive evaluation, bounded by MAX_RECURSION_DEPTH." Clarify which constant is meant (almost certainly MAX_OPERATION_DEPTH) and update accordingly. This needs to be correct before the Rust implementation starts, to avoid the implementer introducing a new unnecessary constant or misusing an existing one.
🟠 Significant — RFC-016 is missing from the RFC index
File: docs/rfcs/index.md
RFC-016 is not listed in the index table. All other accepted/proposed/draft RFCs appear there. This PR should add the row:
| [RFC-016](./rfc-016) | Collection Operations | Proposed |
🟡 Minor — Inconsistency between PR description and RFC on subject vs collection
The PR description (summary section) uses subject: $kinderen_leeftijden (consistent with other operations that use subject), while the RFC itself uses collection: $kinderen_leeftijden. The PR description is stale/out of sync with the RFC text. The RFC's property naming rationale (lines 119–128) explicitly argues for collection over subject. The PR description should match the RFC.
🟡 Minor — AND empty-collection identity is true — needs a usage note
File: docs/rfcs/rfc-016.md, lines 243–244
AND over an empty collection returning true (standard logical identity) is correct mathematically, but in legal contexts it can produce surprising outcomes: a FOREACH that filters for a matching registration and combines with AND would return true when no registrations exist — meaning "all conditions are met." This is the vacuous truth trap. The RFC should note that callers using combine: AND must separately verify the collection is non-empty if the all-empty case represents "no applicable items" rather than "all applicable items satisfy the condition."
🟡 Minor — Schema version reference is inconsistent
File: docs/rfcs/rfc-016.md, line 155 (schema/v0.5.x/schema.json)
The implementation notes reference schema/v0.5.x/schema.json with a wildcard. The existing schema lives at a specific tagged version (schema-v0.5.1). Use the specific version or note explicitly that a new schema version will be required, and that this RFC targets schema v0.5.2 (or v0.6.0 if the change is considered breaking).
Companion to RFC-016. Schema v0.5.1 reduced source: blocks to just regulation/output/parameters, dropping the inline metadata that described how to query a tabular data source. Hosts had to maintain external mapping files to bridge the gap. RFC-017 restores inline native data source binding via additive optional properties: source_type, service, table, field, fields, select_on. Existing v0.5.1 source forms remain valid; the new properties compose with cross-law references and FOREACH (RFC-016). Implementation lands in PR #511 (feat/foreach-implementation): - ExternalSource struct extended in article.rs - RecordSetDataSource added in data_source.rs - Native YAML resolution in service.rs - Schema v0.5.1 additions - PyO3 register_record_set_source binding Eliminates ~600 lines of pre-resolution glue in poc-machine-law's Python wrapper.
There was a problem hiding this comment.
Review of RFC-016 and RFC-017
RFC-016 — FOREACH
🟠 Nested scoping model is self-contradictory
The RFC states two things that cannot both be true:
"the
bodyandfilterexpressions are evaluated in the inner scope, where only the innermostasbinding is visible"
"If an inner FOREACH uses the same
asname as an outer one, the inner binding shadows the outer within its body. To access both, use differentasnames."
The second sentence implies that outer loop variables are accessible (just shadowed if the name collides, and reachable by using a different name). The first sentence says outer loop variables are not accessible at all. These are mutually exclusive.
Additionally, the examples inside FOREACH bodies reference module-level inputs like $extra_16_17_jaar, $kind_vrijstelling, $bewoner.inkomen. If "only the innermost as binding is visible" were literally true, those references would all fail. The examples imply the intended design is standard lexical scoping (child context inherits parent + adds the loop variable), but the prose spec says the opposite.
This inconsistency will cause the implementor to choose one model or the other, and law authors will get surprised. The scoping section needs one coherent statement, e.g.:
"The child context inherits all variables from the parent scope (including module-level inputs and outer loop variables). The
asbinding is added to the child context; if a variable with the same name exists in an outer scope it is shadowed within this FOREACH'sbodyandfilter."
🟡 Untranslatable propagation in array mode unspecified
The error/untranslatable section only covers the combine case:
"if any element produces
Value::Untranslatable, the combined result isValue::Untranslatable"
It does not specify what happens when combine is omitted and results are collected as an array. Should the array contain the untranslatable element, or should the entire FOREACH produce Untranslatable? Implementations will diverge. Suggest adding a line: "When combine is omitted, the same rule applies: a single Untranslatable element causes the collected array to resolve as Untranslatable." (or the alternative if partial arrays are acceptable here).
RFC-017 — Native Data Source Metadata
🟡 AND semantics for multi-criteria select_on is implicit
The "Resolution semantics" say "filter records where each name column equals the resolved value", and the multi-criteria example uses three select_on entries. The conjunction rule (all criteria must match, i.e. AND) is shown by example but never stated. Since OR semantics would give a very different result, this should be made explicit: "All select_on criteria are combined with AND — a record is included only if every criterion matches."
| operation: FOREACH | ||
| collection: $household.members # evaluated in OUTER scope → can see $household | ||
| as: member | ||
| body: $member.income # evaluated in INNER scope → sees $member, not $household |
There was a problem hiding this comment.
The nested scoping section contradicts itself within two paragraphs. Line: "inner scope, where only the innermost as binding is visible" says outer loop variables are inaccessible. But the very next paragraph says "To access both, use different as names", which implies they are accessible. Additionally, the examples in this same RFC ($extra_16_17_jaar, $kind_vrijstelling, $bewoner.inkomen) reference module-level inputs inside FOREACH bodies — these are also "outer" bindings that would be invisible under the first interpretation. Pick one model and state it consistently. The intended model appears to be lexical scoping with shadowing.
|
|
||
| 1. Evaluate `collection` in the current scope to get an array. If the result is not an array, wrap it in a single-element array. If null, treat as empty array. | ||
| 2. For each element in the array: | ||
| a. Create a child execution context (isolated local scope). |
There was a problem hiding this comment.
Untranslatable propagation is only specified for the combine case. For combine omitted (array output), behavior is unspecified: does a single Untranslatable element taint the whole array, or is it included as-is? Needs a sentence.
| # new properties (native data source binding) | ||
| source_type: <kind tag> # optional, e.g. "RvIG", "claim", "cases" | ||
| service: <service code> # optional, e.g. "BELASTINGDIENST" | ||
| table: <data source name> # the data source to query |
There was a problem hiding this comment.
Multiple select_on criteria are shown in examples but the conjunction rule is never stated. Make it explicit: "All select_on criteria are combined with AND — a record is included only if every criterion matches."
Summary
Two RFCs that, taken together, restore expressive power to v0.5.1 laws so the engine can resolve everything natively without host-side pre-resolution glue.
RFC-016: Collection Operations (FOREACH)
Adds a
FOREACHoperation to the schema and engine for iterating over dynamic-length collections (children, registrations, employment periods, events, …). FOREACH was present in v0.2.0–v0.4.0 but removed in v0.5.0; this RFC restores it with cleaner semantics.where(filter) andcombine(aggregate) instead of separate MAP/FILTER/REDUCEMAX_ARRAY_SIZEandMAX_RECURSION_DEPTHRFC-017: Native Data Source Metadata
Restores inline
source.{table, field, fields, select_on, source_type, service}metadata on inputs in v0.5.1 YAML. Schema v0.5.1 had reducedsourceto justregulation/output/parameters, forcing host applications to maintain external mapping files (e.g.source_ref_mapping.json) and pre-resolve every data source input in Python before calling the engine.With this RFC the engine reads the metadata directly from the YAML and resolves data source inputs natively via its
DataSourceRegistry. Hosts only register raw records; the engine handles select_on filtering, field aliasing, and array projection.The two RFCs compose: a FOREACH body can iterate over an array input that was sourced via the new native data source path.
Implementation
Implementation lands in PR #511 (
feat/foreach-implementationbranch). It:FOREACHoperation tooperations.rswith where/combine support and full execution traceSourcestruct inarticle.rswith the new optional fieldsRecordSetDataSource(multi-criteria filter, field aliases, array projection) todata_source.rsservice.rsto query the registry from YAML metadata, falling back to legacy lookups for laws that don't migrateschema/v0.5.1/schema.jsonwith the new optional source properties (backwards compatible)Test plan
source: {}andsource: {regulation, ...}still parse)🤖 Generated with Claude Code