Skip to content

ObjectMapper.__spec2attr lookup fails for inc-site refined AttributeSpec objects (refinements don't bind read or write) #1462

@rly

Description

@rly

Problem

ObjectMapper.__spec2attr (src/hdmf/build/objectmapper.py:482, 553-563) is a dict[Spec, str] keyed by Python object identity, mapping spec objects to the corresponding container attribute name. It is populated at mapper init time from self.__spec.attributes — the def-site type's AttributeSpec objects.

When the matcher pairs a builder to a position-resolved subspec (the matched_spec slot from epic #1448), the inc-site refined AttributeSpec objects on matched_spec.attributes have different Python identities than the def-site ones in __spec2attr:

  • An inc-site refinement (e.g., a parent's subspec adding dtype=int8 to attribute attr2) is a separate AttributeSpec object that lives on the inc-site spec. After resolve_inc_spec (src/hdmf/spec/spec.py:551-563) merges the def-site fields into it in place, the resulting refined object is still the inc-site's object, not the def-site's.
  • An inherited (non-refined) attribute on the inc-site is created via set_attribute(inc_spec_attribute) at spec.py:562, which produces a fresh copy.

So mapper.get_attribute(refined_spec_obj) does __spec2attr.get(refined_spec_obj, None)None. Both the write and read paths interpret that as a missing mapping and silently drop the attribute.

Write-path symptom

In ObjectMapper.build, the extra-attribute lookup (after epic #1448) currently does:

ext_attrs = getattr(builder.matched_spec, 'attributes', tuple())
attrs_by_name = {a.name: a for a in ext_attrs}
for a in self.__spec.attributes:
    attrs_by_name[a.name] = a  # def-site overwrites refinement to keep the mapper lookup valid
all_attrs = list(attrs_by_name.values())
self.__add_attributes(builder, all_attrs, container, manager)

The def-site overwrite is a workaround: it discards the refinement and uses the def-site spec object so get_attribute(spec) succeeds. The result is that inc-site refinements like dtype=int8 are silently validation-only at write time — the build writes whatever the def-site declared, not the refinement.

This contradicts the more useful design where refinements bind the write (so that, e.g., a parent declaring its column's dtype=int8 actually produces an int8 dataset rather than the def-site int).

Write-path repro

tests/unit/build_tests/mapper_tests/test_build.py::TestBuildGroupRefinedAttr::test_build_refined_attr exercises an inc-site refining attr2 from int to int64. The test passes today only because the workaround above keeps the def-site spec; if the workaround is removed (use matched_spec directly), the build emits MissingRequiredBuildWarning: ... missing required value for attribute 'None' and the attribute is dropped from the builder.

Read-path symptom

The same identity-coupling blocks a symmetric fix on the read side. In ObjectMapper.__get_subspec_values (src/hdmf/build/objectmapper.py:1385), the attribute loop currently iterates the def-site spec's attributes:

for attr_spec in spec.attributes:
    ...
    ret[attr_spec] = self.__parse_if_datetime(attr_val, attr_spec)

The natural symmetric fix to let inc-site attribute refinements (e.g., a parent declaring dtype=isodatetime on an inherited attribute) flow into read-time decoding would be:

attr_spec_source = builder.matched_spec or spec
for attr_spec in attr_spec_source.attributes:
    ...

That fix breaks for the same __spec2attr identity reason. attr_spec keys flow through ret into ObjectMapper.construct, which does:

for subspec, value in subspecs.items():
    const_arg = self.get_const_arg(subspec)

get_const_arg (and get_attribute) look up by Python object identity against __spec2attr. An inc-site refined AttributeSpec is a different object — id(refined_attr) != id(def_site_attr) — so the lookup returns None and the parsed value is silently dropped from const_args. The container is then constructed without the value, with no error or warning.

So today, switching the read-path attribute loop to matched_spec.attributes would:

  • Decode inc-site-refined attribute dtypes correctly. ✓
  • Strand the decoded value because the constructor argument can't be resolved. ✗

The read path therefore also has to use the def-site spec.attributes, and inc-site attribute refinements remain validation-only on read just as they are on write.

Proposed fix

Two general directions, both of which unblock read and write simultaneously:

  1. Look up by name. Add a name-keyed map (or a fallback in get_attribute / get_const_arg) so that when a refined spec object isn't in __spec2attr, lookup falls back to spec.name. Cheap, contained.

  2. Register inc-site refined specs in the per-build mapper. When the matcher sets builder.matched_spec, also register any refined attribute specs into a per-builder map that consumers consult before falling back to __spec2attr. More invasive, but cleaner.

Option 1 is the smaller change and likely sufficient.

Related

References

  • src/hdmf/build/objectmapper.py:482, 553-563__spec2attr definition and population.
  • src/hdmf/build/objectmapper.pyget_attribute, get_const_arg, the build-path attribute lookup, and the read-path __get_subspec_values attribute loop that all work around this.
  • src/hdmf/spec/spec.py:540-563BaseStorageSpec.resolve_inc_spec.
  • tests/unit/build_tests/mapper_tests/test_build.py::TestBuildGroupRefinedAttr::test_build_refined_attr — write-side repro.

Metadata

Metadata

Assignees

No one assigned

    Labels

    category: bugerrors in the code or code behaviorpriority: lowalternative solution already working and/or relevant to only specific user(s)topic: maintenanceIssues related to tech debt / code maintainability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions