-
Notifications
You must be signed in to change notification settings - Fork 0
docs: RFC-016 FOREACH + RFC-017 Native Data Source Metadata #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
anneschuth
wants to merge
5
commits into
main
Choose a base branch
from
feat/foreach-operation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
515a2b6
docs: add RFC-016 Collection Operations (FOREACH)
anneschuth 5c7aa54
docs: improve RFC-016 based on review feedback
anneschuth e901b04
docs: address all review feedback on RFC-016
anneschuth 7309618
Merge branch 'main' into feat/foreach-operation
anneschuth d1a415a
docs: add RFC-017 native data source metadata
anneschuth File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,319 @@ | ||
| # RFC-016: Collection Operations | ||
|
|
||
| **Status:** Proposed | ||
| **Date:** 2026-04-07 | ||
| **Authors:** Anne Schuth | ||
|
|
||
| ## Context | ||
|
|
||
| Dutch law frequently reasons about collections of variable length: children in a household, employment periods, registrations, household members. The current v0.5.1 operation set has no way to iterate over a collection and apply per-element logic. | ||
|
|
||
| ### Real examples from Dutch law | ||
|
|
||
| **Kindgebonden budget (artikel 2 WKB).** The *leeftijdstoeslag* depends on each child's age: €703 for children aged 12-15, €936 for children aged 16-17. The law text reads: *"Voor een kind dat 12 jaar of ouder is, maar jonger is dan 16 jaar bedraagt de verhoging van het kindgebonden budget € 703."* The number of children varies per household. | ||
|
|
||
| **Wet BRP / Huurtoeslag.** Counting household members for income thresholds. Each member's income contribution depends on their age (above or below 21). The law says *"medebewoners"* without specifying a maximum. | ||
|
|
||
| **Burgerlijk Wetboek.** Filtering active registrations (curatele, bewind, mentorschap, executeurschap, volmacht) from a registry. Whether a registration is *"actief"* is a legal determination that depends on status fields, not a data-layer concern. | ||
|
|
||
| **AWB bezwaar/beroep.** Counting relevant procedural events (submissions, decisions) from a case history to determine whether deadlines have passed or rights have been exercised. | ||
|
|
||
| ### The alternative without iteration | ||
|
|
||
| Without a collection operation, the law author must pre-aggregate data in the data source layer: | ||
|
|
||
| ```yaml | ||
| # Pre-aggregated: data source provides category counts | ||
| - output: leeftijdstoeslagen | ||
| operation: ADD | ||
| values: | ||
| - operation: MULTIPLY | ||
| values: [$aantal_kinderen_12_15, $extra_12_15_jaar] | ||
| - operation: MULTIPLY | ||
| values: [$aantal_kinderen_16_17, $extra_16_17_jaar] | ||
| ``` | ||
|
|
||
| This works for simple sums-by-category. But it pushes the legal thresholds (12, 16, 17) into the data source layer. The data source must know what "12 jaar of ouder" means in the context of this specific law, which is exactly what regelrecht aims to avoid. | ||
|
|
||
| For filter-and-transform patterns (e.g., selecting active registrations), pre-aggregation requires the data source to understand legal concepts like *"actief bewind"*. That couples the data layer to legal semantics. | ||
|
|
||
| ## Decision | ||
|
|
||
| Add a `FOREACH` operation to the schema and engine. FOREACH iterates over a collection, evaluates an expression per element with the element bound to a local variable, and optionally aggregates results. | ||
|
|
||
| ### YAML syntax | ||
|
|
||
| ```yaml | ||
| operation: FOREACH | ||
| collection: $kinderen_leeftijden # array to iterate over | ||
| as: kind # local variable name (optional, defaults to "item") | ||
| body: # expression evaluated per element | ||
| operation: IF | ||
| cases: | ||
| - when: | ||
| operation: GREATER_THAN_OR_EQUAL | ||
| subject: $kind | ||
| value: 16 | ||
| then: $extra_16_17_jaar | ||
| - when: | ||
| operation: GREATER_THAN_OR_EQUAL | ||
| subject: $kind | ||
| value: 12 | ||
| then: $extra_12_15_jaar | ||
| default: 0 | ||
| combine: ADD # aggregation (optional) | ||
| ``` | ||
|
|
||
| With optional filter: | ||
|
|
||
| ```yaml | ||
| operation: FOREACH | ||
| collection: $curatele_registraties | ||
| as: registratie | ||
| filter: # skip elements where this evaluates to false | ||
| operation: EQUALS | ||
| subject: $registratie.status | ||
| value: ACTIEF | ||
| body: $registratie | ||
| ``` | ||
|
|
||
| Counting events (AWB bezwaar): | ||
|
|
||
| ```yaml | ||
| # Count the number of objection submissions in the event history | ||
| operation: FOREACH | ||
| collection: $gebeurtenissen | ||
| as: event | ||
| filter: | ||
| operation: EQUALS | ||
| subject: $event.event_type | ||
| value: BEZWAAR_INGEDIEND | ||
| body: 1 | ||
| combine: ADD | ||
| ``` | ||
|
|
||
| Household income aggregation (huurtoeslag): | ||
|
|
||
| ```yaml | ||
| # Sum income contributions, with different rules per age group | ||
| operation: FOREACH | ||
| collection: $huishoudleden | ||
| as: bewoner | ||
| body: | ||
| operation: IF | ||
| cases: | ||
| - when: | ||
| operation: GREATER_THAN_OR_EQUAL | ||
| subject: $bewoner.leeftijd | ||
| value: 21 | ||
| then: $bewoner.inkomen | ||
| default: | ||
| operation: SUBTRACT | ||
| values: | ||
| - $bewoner.inkomen | ||
| - $kind_vrijstelling | ||
| combine: ADD | ||
| ``` | ||
|
|
||
| ### Property naming rationale | ||
|
|
||
| FOREACH introduces properties that don't exist in other operations. The names are chosen to be distinct from existing property semantics: | ||
|
|
||
| | Property | Why this name | | ||
| |----------|---------------| | ||
| | `collection` | Distinct from `subject` (used for comparisons) and `values` (used for arithmetic). Describes what it is: the collection to iterate. | | ||
| | `body` | Distinct from `value` (used for comparison target and action assignment). Describes what it is: the expression body to evaluate per element. | | ||
| | `as` | Standard iteration variable binding, familiar from SQL and template languages. | | ||
| | `filter` | Distinct from `conditions` (used for AND/OR). Describes intent: filtering elements. | | ||
| | `combine` | Describes intent: combining per-element results into a single value. | | ||
|
|
||
| ### Variable binding with `as` | ||
|
|
||
| FOREACH is the only operation that introduces a new variable name into scope. This is a new concept in the schema: all other operations reference existing variables, none define them. | ||
|
|
||
| The `as` parameter names a local variable that exists only within the `body` and `filter` expressions of that FOREACH. It shadows any outer variable with the same name. When `as` is omitted, the default name is `item`. The default `item` is chosen as a neutral, language-independent term that does not collide with common domain variable names (unlike `element` which could conflict with XML-related fields, or `current` which suggests temporal context). | ||
|
|
||
| The `filter` expression runs in the child scope where the `as` variable is already bound. This means the filter can access element properties: `$registratie.status` works because `$registratie` is the current element. | ||
|
|
||
| **Nested FOREACH scoping:** Each FOREACH creates an independent child scope. The `collection` expression is evaluated in the **outer** scope (before the child context is created), so it can reference outer iteration variables. The `body` and `filter` expressions are evaluated in the **inner** scope, where only the innermost `as` binding is visible. | ||
|
|
||
| ```yaml | ||
| # Nested: outer $household, inner $member | ||
| operation: FOREACH | ||
| collection: $households | ||
| as: household | ||
| body: | ||
| operation: FOREACH | ||
| collection: $household.members # evaluated in OUTER scope → can see $household | ||
| as: member | ||
| body: $member.income # evaluated in INNER scope → sees $member, not $household | ||
| combine: ADD | ||
| combine: ADD | ||
| ``` | ||
|
|
||
| If an inner FOREACH uses the same `as` name as an outer one, the inner binding shadows the outer within its body. To access both, use different `as` names. | ||
|
|
||
| ### Schema definition | ||
|
|
||
| ```json | ||
| "foreachOperation": { | ||
| "type": "object", | ||
| "required": ["operation", "collection", "body"], | ||
| "additionalProperties": false, | ||
| "properties": { | ||
| "operation": { "const": "FOREACH" }, | ||
| "collection": { | ||
| "$ref": "#/definitions/operationValue", | ||
| "description": "Expression that evaluates to an array." | ||
| }, | ||
| "as": { | ||
| "type": "string", | ||
| "pattern": "^[a-z_][a-z0-9_]*$", | ||
| "description": "Local variable name bound to the current element. Defaults to 'item'." | ||
| }, | ||
| "body": { | ||
| "$ref": "#/definitions/operationValue", | ||
| "description": "Expression evaluated for each element." | ||
| }, | ||
| "filter": { | ||
| "$ref": "#/definitions/operationValue", | ||
| "description": "Boolean expression evaluated in the child scope. Elements where this evaluates to false are skipped." | ||
| }, | ||
| "combine": { | ||
| "type": "string", | ||
| "enum": ["ADD", "OR", "AND", "MIN", "MAX"], | ||
| "description": "Aggregation applied to collected results. When omitted, results are returned as an array." | ||
| }, | ||
| "legal_basis": { "$ref": "#/definitions/legalBasis" } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Semantics | ||
|
|
||
| 1. Evaluate `collection` in the current scope to get an array. If the result is not an array, wrap it in a single-element array. If null, treat as empty array. | ||
| 2. For each element in the array: | ||
| a. Create a child execution context (isolated local scope). | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Untranslatable propagation is only specified for the |
||
| b. Bind the element to the local variable named by `as` (default: `item`). | ||
| c. If `filter` is present, evaluate it in the child context. If the result is falsy, skip this element. | ||
| d. Evaluate `body` in the child context. Collect the result. | ||
| 3. If `combine` is specified, apply the aggregation to collected results and return a single value. | ||
| 4. If `combine` is omitted, return the collected results as an array. | ||
|
|
||
| ### Error handling | ||
|
|
||
| If `body` produces an error for an element, the FOREACH operation propagates the error immediately. Partial results are not returned. Rationale: legal computations must be complete - a partial sum over "some children" is not a valid legal determination. | ||
|
|
||
| If `filter` produces an error, the same rule applies: the error propagates and FOREACH fails. | ||
|
|
||
| If any element produces `Value::Untranslatable` (per RFC-012), the combined result is `Value::Untranslatable`. Untranslatable taints the entire collection result, because a partial determination that silently drops untranslatable elements would be misleading. | ||
|
|
||
| ### Dot notation for element properties | ||
|
|
||
| When iterating over arrays of objects, dot notation accesses properties: | ||
|
|
||
| ```yaml | ||
| collection: $curatele_registraties # [{status: "ACTIEF", bsn_curator: "123"}, ...] | ||
| as: reg | ||
| body: $reg.bsn_curator # accesses the bsn_curator property | ||
| ``` | ||
|
|
||
| This uses existing dot notation support in variable resolution. | ||
|
|
||
| ### Combine operations | ||
|
|
||
| | Combine | Description | Empty collection | | ||
| |---------|-------------|-----------------| | ||
| | `ADD` | Sum numeric results (polymorphic: concatenates strings/arrays per RFC-007) | 0 | | ||
| | `OR` | Logical: any result truthy | false | | ||
| | `AND` | Logical: all results truthy | true | | ||
| | `MIN` | Minimum value | null | | ||
| | `MAX` | Maximum value | null | | ||
| | *(omitted)* | Collect results as array | [] | | ||
|
|
||
| **Why only these five combiners?** The combine operations map to meaningful legal aggregation patterns: | ||
|
|
||
| - `ADD`: *"het totaal van alle bedragen"* (the total of all amounts) | ||
| - `OR`: *"indien ten minste een van de voorwaarden is vervuld"* (if at least one condition is met) | ||
| - `AND`: *"indien aan alle voorwaarden is voldaan"* (if all conditions are met) | ||
| - `MIN`/`MAX`: *"het laagste/hoogste van de bedragen"* (the lowest/highest of the amounts) | ||
|
|
||
| `SUBTRACT`, `MULTIPLY`, and `DIVIDE` are excluded because they are not associative over collections in a meaningful legal sense. Subtracting a list of values is ambiguous (from what?). Multiplying a list of values has no common legal pattern. If a specific law needs such an aggregation, it can be expressed by collecting results as an array (no `combine`) and then applying the arithmetic operation to the array. | ||
|
|
||
| **Empty collection semantics:** `ADD` returns `0` (additive identity), `OR` returns `false`, and `AND` returns `true` (standard logical identities). `MIN` and `MAX` return `null` because there is no meaningful minimum or maximum of nothing - the caller must handle this case. When `combine` is omitted, an empty collection produces an empty array `[]`. | ||
|
|
||
| Note: `ADD` is polymorphic per RFC-007. When all results are strings, `ADD` concatenates them. When all results are arrays, `ADD` flattens them. This covers string-building use cases (e.g., assembling a list of names) without a separate `CONCAT` combiner. | ||
|
|
||
| ### Security constraints | ||
|
|
||
| - Maximum iteration count: `MAX_ARRAY_SIZE` (existing engine config, default 1000). If the collection exceeds this, the engine returns an error. | ||
| - Maximum nesting depth: FOREACH increments `depth` for recursive evaluation, bounded by `MAX_RECURSION_DEPTH`. | ||
| - All collections originate from finite data sources. The schema does not support generators or lazy sequences. | ||
|
|
||
| ## Why | ||
|
|
||
| ### Benefits | ||
|
|
||
| **Legal logic stays in law YAML.** Age thresholds, status checks, and permission rules are legal decisions. The data source provides raw facts (list of children with birth dates); the law determines what to do with them. | ||
|
|
||
| **Matches legal language.** Legislators write *"voor elk kind"*, *"alle actieve registraties"*, *"medebewoners van 21 jaar of ouder"*. These are not separate filter-map-reduce steps in legal text - they are single clauses that combine selection and transformation. FOREACH with `filter` and `combine` maps to this integrated phrasing. Splitting into separate MAP, FILTER, REDUCE operations would force a decomposition that the law text does not make. | ||
|
|
||
| **Engine infrastructure exists.** The engine already has child context creation, local variable binding, and scoped variable resolution. The execution machinery is in place; only the operation dispatch is missing. | ||
|
|
||
| **Concrete use cases.** Multiple Dutch laws across toeslagen, BW delegaties, AWB procedures, and BRP household rules require per-element evaluation over variable-length collections. These are not hypothetical needs. | ||
|
|
||
| ### Tradeoffs | ||
|
|
||
| **Variable binding is a new concept.** Every other operation in the schema is purely referential - it reads existing variables but never creates them. `as` introduces a definition point. This makes FOREACH fundamentally different from arithmetic or logical operations. | ||
|
|
||
| **Non-termination risk.** Mitigated by `MAX_ARRAY_SIZE` (collection size limit) and `MAX_RECURSION_DEPTH` (nesting limit). Both are existing engine configuration values. | ||
|
|
||
| **Pre-aggregation works for simple cases.** When the pattern is purely "count items in categories," pre-aggregation in the data source is simpler. FOREACH is needed when per-element logic involves legal conditions, or when the output is a transformed collection rather than a single aggregate. | ||
|
|
||
| ### Alternatives Considered | ||
|
|
||
| **Pre-aggregation in data sources.** Push all counting and filtering to the data layer. Rejected: this works for simple sums but moves legal conditions (age thresholds, status definitions) out of law YAML. The boundary between data and law becomes unclear. | ||
|
|
||
| **Fixed maximum with unrolled operations.** Generate N branches for up to N items. Rejected: arbitrary limits, verbose YAML, does not handle filter-and-transform patterns, and breaks when the real count exceeds N. | ||
|
|
||
| **Separate MAP, FILTER, REDUCE operations.** Three operations following functional programming conventions. Rejected: Dutch legal text does not decompose collection logic into separate functional steps. A clause like *"de som van de bedragen voor elk kind dat 12 jaar of ouder is"* combines filtering (12 jaar of ouder), transformation (het bedrag), and aggregation (de som) in a single sentence. Three operations would require intermediate outputs (`filtered_children`, `child_amounts`, `total`) that exist nowhere in the law. FOREACH with `filter` and `combine` keeps the YAML close to the legal text. | ||
|
|
||
| **No iteration, restructure all laws.** Accept that laws needing iteration must be restructured to avoid it. Rejected: this is possible for simple aggregation cases but not for filter-and-transform patterns. It also forces legal knowledge into the data layer, which conflicts with regelrecht's design principle of keeping legal logic in law YAML. | ||
|
|
||
| ### Implementation Notes | ||
|
|
||
| **Engine changes:** | ||
| - Add `ForEach` variant to `ActionOperation` enum in `article.rs` with fields: `collection`, `as_name`, `body`, `filter`, `combine` | ||
| - Add `execute_foreach()` in `operations.rs`: | ||
| 1. Evaluate `collection` to `Value::Array` | ||
| 2. For each element: `ctx.create_child()`, `ctx.set_local(as_name, element)`, optionally evaluate `filter`, evaluate `body` | ||
| 3. Apply `combine` aggregation or return array | ||
| - Error propagation: any element error aborts the entire FOREACH | ||
| - Trace: add `PathNodeType::ForEachIteration` with element index for execution tracing | ||
| - Untranslatable propagation: if any element produces `Value::Untranslatable`, the combined result is `Value::Untranslatable` (per RFC-012) | ||
|
|
||
| **Schema changes:** | ||
| - Add `foreachOperation` to `definitions` in `schema/v0.5.x/schema.json` | ||
| - Add `FOREACH` to `operationType` enum | ||
| - Add `foreachOperation` to the `operation` oneOf discriminator | ||
|
|
||
| **Conformance tests:** | ||
| - `foreach_basic.json`: iterate over number array, combine with ADD | ||
| - `foreach_filter.json`: iterate with `filter` clause, verify skipped elements | ||
| - `foreach_objects.json`: iterate over object array, access properties via dot notation | ||
| - `foreach_nested.json`: nested FOREACH with independent scopes, verify outer variable accessible in inner `collection` but not inner `body` | ||
| - `foreach_empty.json`: empty and null collection handling per combine type | ||
| - `foreach_no_combine.json`: collect results as array (no combine) | ||
| - `foreach_string_combine.json`: combine with ADD on string results (concatenation) | ||
| - `foreach_error.json`: error in body propagates, partial results not returned | ||
|
|
||
| ## References | ||
|
|
||
| - RFC-004: Uniform Operation Syntax (property naming conventions) | ||
| - RFC-007: Cross-Law Execution Model (operation set, polymorphic ADD) | ||
| - RFC-012: Untranslatables (current workaround for laws needing iteration) | ||
| - Wet op het kindgebonden budget, artikel 2: https://wetten.overheid.nl/BWBR0022751/2025-01-01#Artikel2 | ||
| - Wet op de huurtoeslag, artikel 7: https://wetten.overheid.nl/BWBR0008659/2025-01-01#Artikel7 | ||
| - AWB, artikel 6:7: https://wetten.overheid.nl/BWBR0005537/2024-01-01#Artikel6:7 | ||
| - Burgerlijk Wetboek Boek 1, titel 16 (curatele): https://wetten.overheid.nl/BWBR0002656/2025-01-01 | ||
| - [Glossary of Dutch Legal Terms](/reference/glossary) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested scoping section contradicts itself within two paragraphs. Line: "inner scope, where only the innermost
asbinding is visible" says outer loop variables are inaccessible. But the very next paragraph says "To access both, use differentasnames", which implies they are accessible. Additionally, the examples in this same RFC ($extra_16_17_jaar,$kind_vrijstelling,$bewoner.inkomen) reference module-level inputs inside FOREACH bodies — these are also "outer" bindings that would be invisible under the first interpretation. Pick one model and state it consistently. The intended model appears to be lexical scoping with shadowing.