Skip to content

docs: RFC-016 FOREACH + RFC-017 Native Data Source Metadata#510

Open
anneschuth wants to merge 5 commits intomainfrom
feat/foreach-operation
Open

docs: RFC-016 FOREACH + RFC-017 Native Data Source Metadata#510
anneschuth wants to merge 5 commits intomainfrom
feat/foreach-operation

Conversation

@anneschuth
Copy link
Copy Markdown
Member

@anneschuth anneschuth commented Apr 7, 2026

Summary

Two RFCs that, taken together, restore expressive power to v0.5.1 laws so the engine can resolve everything natively without host-side pre-resolution glue.

RFC-016: Collection Operations (FOREACH)

Adds a FOREACH operation to the schema and engine for iterating over dynamic-length collections (children, registrations, employment periods, events, …). FOREACH was present in v0.2.0–v0.4.0 but removed in v0.5.0; this RFC restores it with cleaner semantics.

  • Single operation with optional where (filter) and combine (aggregate) instead of separate MAP/FILTER/REDUCE
  • Uses existing child-context infrastructure for scoping
  • Bounded by MAX_ARRAY_SIZE and MAX_RECURSION_DEPTH
  • Dot notation for property access on collection elements

RFC-017: Native Data Source Metadata

Restores inline source.{table, field, fields, select_on, source_type, service} metadata on inputs in v0.5.1 YAML. Schema v0.5.1 had reduced source to just regulation/output/parameters, forcing host applications to maintain external mapping files (e.g. source_ref_mapping.json) and pre-resolve every data source input in Python before calling the engine.

With this RFC the engine reads the metadata directly from the YAML and resolves data source inputs natively via its DataSourceRegistry. Hosts only register raw records; the engine handles select_on filtering, field aliasing, and array projection.

The two RFCs compose: a FOREACH body can iterate over an array input that was sourced via the new native data source path.

Implementation

Implementation lands in PR #511 (feat/foreach-implementation branch). It:

  • Adds the FOREACH operation to operations.rs with where/combine support and full execution trace
  • Extends the Source struct in article.rs with the new optional fields
  • Adds RecordSetDataSource (multi-criteria filter, field aliases, array projection) to data_source.rs
  • Updates service.rs to query the registry from YAML metadata, falling back to legacy lookups for laws that don't migrate
  • Extends schema/v0.5.1/schema.json with the new optional source properties (backwards compatible)
  • Adds 14 new unit tests covering the new code paths

Test plan

  • All existing tests pass: 375/375 in regelrecht
  • poc-machine-law BDD suite passes: 340/340 scenarios across 80 laws
  • Schema v0.5.1 backwards compatibility verified (existing source: {} and source: {regulation, ...} still parse)

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Preview Deployment — harvester-admin

Your changes have been deployed to a preview environment:

URL: https://harvester-admin-pr510-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Preview Deployment — harvester-worker

Your changes have been deployed to a preview environment:

URL: https://harvester-worker-pr510-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Preview Deployment — editor — editor

Your changes have been deployed to a preview environment:

URL: https://editor-pr510-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

- Rename subject→collection, value→body, where→filter (avoid overloading existing property names)
- Add property naming rationale table
- Document variable binding with 'as' as a new concept in the schema
- Describe nested FOREACH scoping rules and shadowing
- Explain polymorphic ADD covers string concatenation (no separate CONCAT)
- Remove references to specific implementation projects
- Remove history of FOREACH in previous schema versions
- Expand alternatives considered with stronger arguments
- Add foreach_string_combine conformance test
- Add lowercase pattern constraint on 'as' variable name
@github-actions github-actions bot deleted a comment from claude bot Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Preview Deployment — docs — docs

Your changes have been deployed to a preview environment:

URL: https://docs-pr510-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC-016: Collection Operations — Review

Semantics

🟠 Significant — Nested FOREACH scoping: collection expression contradicts scope description

The nested example (lines 120–131) shows collection: $household.members in the inner FOREACH, which requires $household to be visible when that expression is evaluated. But the comment immediately below says:

body: $member.income  # inner scope only sees $member, not $household

The comment implies the inner scope is opaque to outer variables, but $household.members depends on $household being in scope. The semantics section does not clarify when the collection expression is evaluated relative to scope creation.

The RFC needs to explicitly state: collection and filter are evaluated in the calling scope (before the child context is created); only body is evaluated in the child scope. Without this, implementers will disagree — one plausible reading of the semantics section (step 2a: "create child context", then 2b: "bind element", then 2c: "evaluate filter") suggests filter is evaluated in the child scope, but step 2a creates the child before any evaluation happens, which means collection must be pre-evaluated in the outer scope. None of this is stated.

The sentence "To access the outer variable in the inner body, extract it to a separate output first" also conflicts with the example that successfully does access $household from the inner level. This needs to be resolved.


🟠 Significant — ADD empty-collection identity is wrong for non-numeric types

The combine semantics table (line ~175) states:

| ADD | Sum numeric results (polymorphic...) | 0 |

But the RFC explicitly states ADD is polymorphic: strings concatenate, arrays flatten. For an empty string-body collection, the identity for string-ADD is "" (empty string), not 0. Returning 0 would produce a type mismatch whenever the result is used in a string context downstream. Same problem for array-body collections (identity should be [], not 0).

The empty-collection identity should be type-dependent:

  • numeric: 0
  • string: ""
  • array: []

Or the RFC should explicitly restrict combine: ADD to numeric body expressions only and introduce CONCAT for strings.


Minor

🟡 Scoping note "unless passed explicitly" is misleading

Line ~115: "Inner iterations cannot see outer iteration variables unless they are passed explicitly."

No mechanism for "passing explicitly" exists in the schema — there is no way to forward a variable into a child scope. The qualifier should be removed; the sentence should read "Inner iterations cannot see outer iteration variables." The example showing $household.members working in the inner collection should be the documented escape hatch (access the outer variable in the collection expression, not in the body).


🟡 Untranslatable propagation does not address filtered elements

Implementation notes state: "if any element produces Value::Untranslatable, the combined result is Value::Untranslatable." This does not address the interaction with filter: if an element would produce Untranslatable but filter skips it, should the untranslatable still propagate? The filter expression itself might also be untranslatable (distinct from the body). Both cases should be specified here to avoid divergent implementations.

- Fix nested scoping: collection evaluated in outer scope, body/filter
  in inner scope. Made explicit with clear explanation.
- Add error handling section: errors propagate immediately, no partial
  results (legal computations must be complete).
- Explain why combine enum excludes SUBTRACT/MULTIPLY/DIVIDE (not
  associative over collections in legal sense).
- Explain empty collection semantics per combine type (identity values
  vs null).
- Motivate 'item' as default for 'as' (neutral, no domain collision).
- Clarify filter runs in child scope where element is already bound.
- Strengthen MAP/FILTER/REDUCE rejection with linguistic argument
  (legal text combines selection+transformation+aggregation in single
  clauses, not separate functional steps).
- Add AWB event counting and huurtoeslag household examples.
- Add foreach_error conformance test.
- Add legal references for huurtoeslag and AWB.
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC-016: Collection Operations (FOREACH) — Review

This RFC is well-motivated and the design rationale is sound. The single-operation approach (vs separate MAP/FILTER/REDUCE) is justified, the legal use cases are concrete, and the property naming rationale is clear. The following issues need to be resolved before implementation begins.


🔴 Critical — Nested FOREACH scoping is wrong

File: docs/rfcs/rfc-016.md, lines 141–153 (nested FOREACH example)

The RFC states:

collection: $household.members # evaluated in OUTER scope → can see $household

This is incorrect given the current engine implementation. create_child() in context.rs explicitly starts with an empty local scope:

local: BTreeMap::new(), // Child starts with empty local scope

$household is bound as a local variable in the outer FOREACH child context. When the inner FOREACH evaluates its own collection expression, it runs inside the inner child context (which inherits parameters, outputs, resolved_inputs, and definitions — but not locals from the parent). $household is a local, so it is invisible in the inner context. The expression $household.members would resolve to "variable not found" and the FOREACH would error.

The semantics section (step 1) says "Evaluate collection in the current scope" — but "current scope" for the inner FOREACH is the inner child scope, not the outer. The RFC needs to either:

  1. Define that collection is evaluated before creating the child context (in the caller's scope, not a new child), making it consistent with the current create_child() behaviour where locals don't carry over; or
  2. Specify that FOREACH child contexts inherit parent local variables (which contradicts the existing engine behaviour and the explicit design rationale in context.rs).

The implementation notes must also be updated accordingly — this affects execute_foreach() design.


🟠 Significant — MAX_RECURSION_DEPTH does not exist

File: docs/rfcs/rfc-016.md, lines 249–250

The RFC refers to MAX_RECURSION_DEPTH as an "existing engine configuration value." This constant does not exist. The relevant constants in config.rs are:

  • MAX_OPERATION_DEPTH (100) — nesting depth for operations during evaluation
  • MAX_RESOLUTION_DEPTH (50) — internal reference resolution
  • MAX_CROSS_LAW_DEPTH (20) — cross-law call depth

The implementation notes (lines 286–289) say "FOREACH increments depth for recursive evaluation, bounded by MAX_RECURSION_DEPTH." Clarify which constant is meant (almost certainly MAX_OPERATION_DEPTH) and update accordingly. This needs to be correct before the Rust implementation starts, to avoid the implementer introducing a new unnecessary constant or misusing an existing one.


🟠 Significant — RFC-016 is missing from the RFC index

File: docs/rfcs/index.md

RFC-016 is not listed in the index table. All other accepted/proposed/draft RFCs appear there. This PR should add the row:

| [RFC-016](./rfc-016) | Collection Operations | Proposed |

🟡 Minor — Inconsistency between PR description and RFC on subject vs collection

The PR description (summary section) uses subject: $kinderen_leeftijden (consistent with other operations that use subject), while the RFC itself uses collection: $kinderen_leeftijden. The PR description is stale/out of sync with the RFC text. The RFC's property naming rationale (lines 119–128) explicitly argues for collection over subject. The PR description should match the RFC.


🟡 Minor — AND empty-collection identity is true — needs a usage note

File: docs/rfcs/rfc-016.md, lines 243–244

AND over an empty collection returning true (standard logical identity) is correct mathematically, but in legal contexts it can produce surprising outcomes: a FOREACH that filters for a matching registration and combines with AND would return true when no registrations exist — meaning "all conditions are met." This is the vacuous truth trap. The RFC should note that callers using combine: AND must separately verify the collection is non-empty if the all-empty case represents "no applicable items" rather than "all applicable items satisfy the condition."


🟡 Minor — Schema version reference is inconsistent

File: docs/rfcs/rfc-016.md, line 155 (schema/v0.5.x/schema.json)

The implementation notes reference schema/v0.5.x/schema.json with a wildcard. The existing schema lives at a specific tagged version (schema-v0.5.1). Use the specific version or note explicitly that a new schema version will be required, and that this RFC targets schema v0.5.2 (or v0.6.0 if the change is considered breaking).

Companion to RFC-016. Schema v0.5.1 reduced source: blocks to just
regulation/output/parameters, dropping the inline metadata that
described how to query a tabular data source. Hosts had to maintain
external mapping files to bridge the gap.

RFC-017 restores inline native data source binding via additive
optional properties: source_type, service, table, field, fields,
select_on. Existing v0.5.1 source forms remain valid; the new
properties compose with cross-law references and FOREACH (RFC-016).

Implementation lands in PR #511 (feat/foreach-implementation):
- ExternalSource struct extended in article.rs
- RecordSetDataSource added in data_source.rs
- Native YAML resolution in service.rs
- Schema v0.5.1 additions
- PyO3 register_record_set_source binding

Eliminates ~600 lines of pre-resolution glue in poc-machine-law's
Python wrapper.
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of RFC-016 and RFC-017

RFC-016 — FOREACH

🟠 Nested scoping model is self-contradictory

The RFC states two things that cannot both be true:

"the body and filter expressions are evaluated in the inner scope, where only the innermost as binding is visible"

"If an inner FOREACH uses the same as name as an outer one, the inner binding shadows the outer within its body. To access both, use different as names."

The second sentence implies that outer loop variables are accessible (just shadowed if the name collides, and reachable by using a different name). The first sentence says outer loop variables are not accessible at all. These are mutually exclusive.

Additionally, the examples inside FOREACH bodies reference module-level inputs like $extra_16_17_jaar, $kind_vrijstelling, $bewoner.inkomen. If "only the innermost as binding is visible" were literally true, those references would all fail. The examples imply the intended design is standard lexical scoping (child context inherits parent + adds the loop variable), but the prose spec says the opposite.

This inconsistency will cause the implementor to choose one model or the other, and law authors will get surprised. The scoping section needs one coherent statement, e.g.:

"The child context inherits all variables from the parent scope (including module-level inputs and outer loop variables). The as binding is added to the child context; if a variable with the same name exists in an outer scope it is shadowed within this FOREACH's body and filter."


🟡 Untranslatable propagation in array mode unspecified

The error/untranslatable section only covers the combine case:

"if any element produces Value::Untranslatable, the combined result is Value::Untranslatable"

It does not specify what happens when combine is omitted and results are collected as an array. Should the array contain the untranslatable element, or should the entire FOREACH produce Untranslatable? Implementations will diverge. Suggest adding a line: "When combine is omitted, the same rule applies: a single Untranslatable element causes the collected array to resolve as Untranslatable." (or the alternative if partial arrays are acceptable here).


RFC-017 — Native Data Source Metadata

🟡 AND semantics for multi-criteria select_on is implicit

The "Resolution semantics" say "filter records where each name column equals the resolved value", and the multi-criteria example uses three select_on entries. The conjunction rule (all criteria must match, i.e. AND) is shown by example but never stated. Since OR semantics would give a very different result, this should be made explicit: "All select_on criteria are combined with AND — a record is included only if every criterion matches."

Comment thread docs/rfcs/rfc-016.md
operation: FOREACH
collection: $household.members # evaluated in OUTER scope → can see $household
as: member
body: $member.income # evaluated in INNER scope → sees $member, not $household
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested scoping section contradicts itself within two paragraphs. Line: "inner scope, where only the innermost as binding is visible" says outer loop variables are inaccessible. But the very next paragraph says "To access both, use different as names", which implies they are accessible. Additionally, the examples in this same RFC ($extra_16_17_jaar, $kind_vrijstelling, $bewoner.inkomen) reference module-level inputs inside FOREACH bodies — these are also "outer" bindings that would be invisible under the first interpretation. Pick one model and state it consistently. The intended model appears to be lexical scoping with shadowing.

Comment thread docs/rfcs/rfc-016.md

1. Evaluate `collection` in the current scope to get an array. If the result is not an array, wrap it in a single-element array. If null, treat as empty array.
2. For each element in the array:
a. Create a child execution context (isolated local scope).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untranslatable propagation is only specified for the combine case. For combine omitted (array output), behavior is unspecified: does a single Untranslatable element taint the whole array, or is it included as-is? Needs a sentence.

Comment thread docs/rfcs/rfc-017.md
# new properties (native data source binding)
source_type: <kind tag> # optional, e.g. "RvIG", "claim", "cases"
service: <service code> # optional, e.g. "BELASTINGDIENST"
table: <data source name> # the data source to query
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple select_on criteria are shown in examples but the conjunction rule is never stated. Make it explicit: "All select_on criteria are combined with AND — a record is included only if every criterion matches."

@anneschuth anneschuth changed the title docs: add RFC-016 Collection Operations (FOREACH) docs: RFC-016 FOREACH + RFC-017 Native Data Source Metadata Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant