You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HDMF has two independent implementations of the same problem: pair builders against the candidate subspecs in a parent's spec.
Build/read matcher: ObjectMapper.__get_subspec_values and __get_sub_builders in src/hdmf/build/objectmapper.py. Used during mapper.build (write) and mapper.construct (read) to pair sub-builders to subspecs and recurse.
Validator matcher: SpecMatcher class in src/hdmf/validate/validator.py:741. Used during validation to pair builders against expected specs.
The two diverged organically. They handle the same fundamental problem with different algorithms and different bug profiles.
The validator already notes its algorithm has issues: # TODO future improvements to this matching algorithm should resolve these discrepancies (validator.py:794-797). The build/read matcher has the same kind of issues but no such note.
Proposed shared module
Lift the pairing logic into a pure helper, e.g. hdmf.build.matcher or similar, with a signature roughly like:
Pure: no callbacks into manager.construct, no side effects on builders. Returns a mapping; consumers act on it.
Quantity-aware: when multiple candidates fit, prefer unsatisfied required slots first (the validator's algorithm).
Tracks unmatched: returns a list of sub-builders that didn't match any subspec, so consumers can warn or error.
Hierarchy-aware: walks the namespace inheritance hierarchy when matching by data_type (the build/read matcher does this; the validator doesn't).
Consumers:
ObjectMapper.__get_subspec_values calls the helper, then iterates matches to call manager.construct(builder) (read) or already-pre-created builders (write). After Phase 4 of Refactor: matched specs on builders (replace spec_ext plumbing) #1448, the matcher also sets builder.matched_spec = subspec — that side-effect can stay in the consumer or move into the shared helper as an option.
SpecMatcher in the validator becomes a thin wrapper around the helper, plus the existing per-builder validation.
Improvements that would propagate
After consolidation:
Unmatched-builder warnings on read/write. Currently if a parent has an unexpected child (typo in a column name, stale data from an older spec version), the build/read matcher silently drops it. The validator already flags these. Sharing the matcher gives the same warning to read/write paths, at minimum as a debug log.
Quantity-aware tie-breaking everywhere. When a parent has multiple candidate subspecs of the same type with different quantities (e.g., one required and one optional), and there's exactly one builder of that type, today's build/read matcher arbitrarily assigns to the first matching subspec. The validator's algorithm prefers unsatisfied required slots. Real correctness improvement, propagates to read/write.
Hierarchy-aware matching in the validator. The validator's matcher doesn't walk the namespace inheritance hierarchy, so a builder of type MySpecificThing may not match a subspec calling for the parent type MyGenericThing. The build/read matcher's __get_sub_builders does walk namespace_catalog.get_hierarchy(...). Sharing fixes the validator's gap.
What's not in the shared matcher
Per-builder content validation (dtype, shape, required attributes, refs). Those are post-match concerns; the matcher's job is just pairing.
Construction (manager.construct). Stays in ObjectMapper.
Phase A: introduce the shared helper; have both ObjectMapper.__get_sub_builders and SpecMatcher delegate to it. Behavior should be identical to whichever consumer-side algorithm we standardize on (probably the validator's, with the build/read matcher's hierarchy walk added).
Phase B: remove duplicate code in SpecMatcher and __get_sub_builders.
Phase C: turn on the new behavior (quantity-aware tie-breaking, unmatched-builder logging) in build/read paths. May require updating tests that depend on the old arbitrary-tie behavior.
Why now (context)
This came up while reviewing the matched-specs refactor in #1448. The build/read matcher is the natural place to attach builder.matched_spec, and looking at its responsibilities made the duplication with the validator's matcher visible. Worth doing during validator integration, not before — but worth tracking now so we don't quietly let the two diverge further.
Problem
HDMF has two independent implementations of the same problem: pair builders against the candidate subspecs in a parent's spec.
ObjectMapper.__get_subspec_valuesand__get_sub_buildersinsrc/hdmf/build/objectmapper.py. Used duringmapper.build(write) andmapper.construct(read) to pair sub-builders to subspecs and recurse.SpecMatcherclass insrc/hdmf/validate/validator.py:741. Used during validation to pair builders against expected specs.The two diverged organically. They handle the same fundamental problem with different algorithms and different bug profiles.
How they differ today
manager.construct(...)calls; setsbuilder.matched_specThe validator already notes its algorithm has issues:
# TODO future improvements to this matching algorithm should resolve these discrepancies(validator.py:794-797). The build/read matcher has the same kind of issues but no such note.Proposed shared module
Lift the pairing logic into a pure helper, e.g.
hdmf.build.matcheror similar, with a signature roughly like:Properties of the shared helper:
manager.construct, no side effects on builders. Returns a mapping; consumers act on it.Consumers:
ObjectMapper.__get_subspec_valuescalls the helper, then iterates matches to callmanager.construct(builder)(read) or already-pre-created builders (write). After Phase 4 of Refactor: matched specs on builders (replace spec_ext plumbing) #1448, the matcher also setsbuilder.matched_spec = subspec— that side-effect can stay in the consumer or move into the shared helper as an option.SpecMatcherin the validator becomes a thin wrapper around the helper, plus the existing per-builder validation.Improvements that would propagate
After consolidation:
MySpecificThingmay not match a subspec calling for the parent typeMyGenericThing. The build/read matcher's__get_sub_buildersdoes walknamespace_catalog.get_hierarchy(...). Sharing fixes the validator's gap.What's not in the shared matcher
manager.construct). Stays inObjectMapper.builder.matched_specside-effect from Refactor: matched specs on builders (replace spec_ext plumbing) #1448's Phase 2/4 work could live in the matcher (as an opt-in argument) or stay inObjectMapper. Probably cleanest in the consumer.Migration
ObjectMapper.__get_sub_buildersandSpecMatcherdelegate to it. Behavior should be identical to whichever consumer-side algorithm we standardize on (probably the validator's, with the build/read matcher's hierarchy walk added).SpecMatcherand__get_sub_builders.Why now (context)
This came up while reviewing the matched-specs refactor in #1448. The build/read matcher is the natural place to attach
builder.matched_spec, and looking at its responsibilities made the duplication with the validator's matcher visible. Worth doing during validator integration, not before — but worth tracking now so we don't quietly let the two diverge further.References
src/hdmf/build/objectmapper.py—__get_subspec_values(~line 1380),__get_sub_builders(~line 1448)src/hdmf/validate/validator.py—SpecMatcher(line 741),SpecMatches(line 730)