Problem
ObjectMapper.__spec2attr (src/hdmf/build/objectmapper.py:482, 553-563) is a dict[Spec, str] keyed by Python object identity, mapping spec objects to the corresponding container attribute name. It is populated at mapper init time from self.__spec.attributes — the def-site type's AttributeSpec objects.
When the matcher pairs a builder to a position-resolved subspec (the matched_spec slot from epic #1448), the inc-site refined AttributeSpec objects on matched_spec.attributes have different Python identities than the def-site ones in __spec2attr:
- An inc-site refinement (e.g., a parent's subspec adding
dtype=int8 to attribute attr2) is a separate AttributeSpec object that lives on the inc-site spec. After resolve_inc_spec (src/hdmf/spec/spec.py:551-563) merges the def-site fields into it in place, the resulting refined object is still the inc-site's object, not the def-site's.
- An inherited (non-refined) attribute on the inc-site is created via
set_attribute(inc_spec_attribute) at spec.py:562, which produces a fresh copy.
So mapper.get_attribute(refined_spec_obj) does __spec2attr.get(refined_spec_obj, None) → None. Both the write and read paths interpret that as a missing mapping and silently drop the attribute.
Write-path symptom
In ObjectMapper.build, the extra-attribute lookup (after epic #1448) currently does:
ext_attrs = getattr(builder.matched_spec, 'attributes', tuple())
attrs_by_name = {a.name: a for a in ext_attrs}
for a in self.__spec.attributes:
attrs_by_name[a.name] = a # def-site overwrites refinement to keep the mapper lookup valid
all_attrs = list(attrs_by_name.values())
self.__add_attributes(builder, all_attrs, container, manager)
The def-site overwrite is a workaround: it discards the refinement and uses the def-site spec object so get_attribute(spec) succeeds. The result is that inc-site refinements like dtype=int8 are silently validation-only at write time — the build writes whatever the def-site declared, not the refinement.
This contradicts the more useful design where refinements bind the write (so that, e.g., a parent declaring its column's dtype=int8 actually produces an int8 dataset rather than the def-site int).
Write-path repro
tests/unit/build_tests/mapper_tests/test_build.py::TestBuildGroupRefinedAttr::test_build_refined_attr exercises an inc-site refining attr2 from int to int64. The test passes today only because the workaround above keeps the def-site spec; if the workaround is removed (use matched_spec directly), the build emits MissingRequiredBuildWarning: ... missing required value for attribute 'None' and the attribute is dropped from the builder.
Read-path symptom
The same identity-coupling blocks a symmetric fix on the read side. In ObjectMapper.__get_subspec_values (src/hdmf/build/objectmapper.py:1385), the attribute loop currently iterates the def-site spec's attributes:
for attr_spec in spec.attributes:
...
ret[attr_spec] = self.__parse_if_datetime(attr_val, attr_spec)
The natural symmetric fix to let inc-site attribute refinements (e.g., a parent declaring dtype=isodatetime on an inherited attribute) flow into read-time decoding would be:
attr_spec_source = builder.matched_spec or spec
for attr_spec in attr_spec_source.attributes:
...
That fix breaks for the same __spec2attr identity reason. attr_spec keys flow through ret into ObjectMapper.construct, which does:
for subspec, value in subspecs.items():
const_arg = self.get_const_arg(subspec)
get_const_arg (and get_attribute) look up by Python object identity against __spec2attr. An inc-site refined AttributeSpec is a different object — id(refined_attr) != id(def_site_attr) — so the lookup returns None and the parsed value is silently dropped from const_args. The container is then constructed without the value, with no error or warning.
So today, switching the read-path attribute loop to matched_spec.attributes would:
- Decode inc-site-refined attribute dtypes correctly. ✓
- Strand the decoded value because the constructor argument can't be resolved. ✗
The read path therefore also has to use the def-site spec.attributes, and inc-site attribute refinements remain validation-only on read just as they are on write.
Proposed fix
Two general directions, both of which unblock read and write simultaneously:
-
Look up by name. Add a name-keyed map (or a fallback in get_attribute / get_const_arg) so that when a refined spec object isn't in __spec2attr, lookup falls back to spec.name. Cheap, contained.
-
Register inc-site refined specs in the per-build mapper. When the matcher sets builder.matched_spec, also register any refined attribute specs into a per-builder map that consumers consult before falling back to __spec2attr. More invasive, but cleaner.
Option 1 is the smaller change and likely sufficient.
Related
References
src/hdmf/build/objectmapper.py:482, 553-563 — __spec2attr definition and population.
src/hdmf/build/objectmapper.py — get_attribute, get_const_arg, the build-path attribute lookup, and the read-path __get_subspec_values attribute loop that all work around this.
src/hdmf/spec/spec.py:540-563 — BaseStorageSpec.resolve_inc_spec.
tests/unit/build_tests/mapper_tests/test_build.py::TestBuildGroupRefinedAttr::test_build_refined_attr — write-side repro.
Problem
ObjectMapper.__spec2attr(src/hdmf/build/objectmapper.py:482, 553-563) is adict[Spec, str]keyed by Python object identity, mapping spec objects to the corresponding container attribute name. It is populated at mapper init time fromself.__spec.attributes— the def-site type's AttributeSpec objects.When the matcher pairs a builder to a position-resolved subspec (the
matched_specslot from epic #1448), the inc-site refined AttributeSpec objects onmatched_spec.attributeshave different Python identities than the def-site ones in__spec2attr:dtype=int8to attributeattr2) is a separate AttributeSpec object that lives on the inc-site spec. Afterresolve_inc_spec(src/hdmf/spec/spec.py:551-563) merges the def-site fields into it in place, the resulting refined object is still the inc-site's object, not the def-site's.set_attribute(inc_spec_attribute)atspec.py:562, which produces a fresh copy.So
mapper.get_attribute(refined_spec_obj)does__spec2attr.get(refined_spec_obj, None)→None. Both the write and read paths interpret that as a missing mapping and silently drop the attribute.Write-path symptom
In
ObjectMapper.build, the extra-attribute lookup (after epic #1448) currently does:The def-site overwrite is a workaround: it discards the refinement and uses the def-site spec object so
get_attribute(spec)succeeds. The result is that inc-site refinements likedtype=int8are silently validation-only at write time — the build writes whatever the def-site declared, not the refinement.This contradicts the more useful design where refinements bind the write (so that, e.g., a parent declaring its column's
dtype=int8actually produces an int8 dataset rather than the def-siteint).Write-path repro
tests/unit/build_tests/mapper_tests/test_build.py::TestBuildGroupRefinedAttr::test_build_refined_attrexercises an inc-site refiningattr2frominttoint64. The test passes today only because the workaround above keeps the def-site spec; if the workaround is removed (usematched_specdirectly), the build emitsMissingRequiredBuildWarning: ... missing required value for attribute 'None'and the attribute is dropped from the builder.Read-path symptom
The same identity-coupling blocks a symmetric fix on the read side. In
ObjectMapper.__get_subspec_values(src/hdmf/build/objectmapper.py:1385), the attribute loop currently iterates the def-site spec's attributes:The natural symmetric fix to let inc-site attribute refinements (e.g., a parent declaring
dtype=isodatetimeon an inherited attribute) flow into read-time decoding would be:That fix breaks for the same
__spec2attridentity reason.attr_speckeys flow throughretintoObjectMapper.construct, which does:get_const_arg(andget_attribute) look up by Python object identity against__spec2attr. An inc-site refined AttributeSpec is a different object —id(refined_attr) != id(def_site_attr)— so the lookup returnsNoneand the parsed value is silently dropped fromconst_args. The container is then constructed without the value, with no error or warning.So today, switching the read-path attribute loop to
matched_spec.attributeswould:The read path therefore also has to use the def-site
spec.attributes, and inc-site attribute refinements remain validation-only on read just as they are on write.Proposed fix
Two general directions, both of which unblock read and write simultaneously:
Look up by name. Add a name-keyed map (or a fallback in
get_attribute/get_const_arg) so that when a refined spec object isn't in__spec2attr, lookup falls back tospec.name. Cheap, contained.Register inc-site refined specs in the per-build mapper. When the matcher sets
builder.matched_spec, also register any refined attribute specs into a per-builder map that consumers consult before falling back to__spec2attr. More invasive, but cleaner.Option 1 is the smaller change and likely sufficient.
Related
__spec2attrsemantics.matched_spec.attributesis incomplete for nested-inside-untyped typed subspecs, so even with a fix here, the def-site fallback is still needed.ObjectMapper.build's attribute lookup can be replaced withall_attrs = builder.matched_spec.attributes if builder.matched_spec is not None else self.__spec.attributes, and refinements will bind the write.__get_subspec_valuescan be switched to(builder.matched_spec or spec).attributes, and refinements (e.g., inc-sitedtype: isodatetimeon an attribute) will bind the read.References
src/hdmf/build/objectmapper.py:482, 553-563—__spec2attrdefinition and population.src/hdmf/build/objectmapper.py—get_attribute,get_const_arg, the build-path attribute lookup, and the read-path__get_subspec_valuesattribute loop that all work around this.src/hdmf/spec/spec.py:540-563—BaseStorageSpec.resolve_inc_spec.tests/unit/build_tests/mapper_tests/test_build.py::TestBuildGroupRefinedAttr::test_build_refined_attr— write-side repro.