Skip to content

Conversation

@rly
Copy link
Contributor

@rly rly commented May 8, 2025

Summary of changes

From #604: the new BaseImage type is a dataset where the data has no constraints on dtype or shape but is required. The new ExternalImage has an attribute for file_path but because it inherits from BaseImage, it is also a dataset and therefore has a required data value. @oruebel and I think that ExternalImage.data should be used to store the scalar string file path instead of the attribute.

We cannot yet specify that ExternalImage.data must be a scalar, so we leave the shape to be Any shape. shape: scalar will be supported in the new 2.0 schema language.

Checklist

For all schema changes:

  • Add release notes for the PR to docs/format/source/format_release_notes.rst.
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.
  • Make sure that hdmf-common-schema points to the latest release and not the latest commit on the main branch.

If this is the first schema change after a schema release (i.e., the version string in core/nwb.namespace.yaml does not
end in "-alpha"), then:

  • Update the version string in core/nwb.namespace.yaml and core/nwb.file.yaml to the next major/minor/patch
    version with the suffix "-alpha". For example, if the current version is 2.4.0 and this is a minor change, then the
    new version string should be "2.5.0-alpha".
  • Update the value of the version variable in docs/format/source/conf.py to the next version without the
    suffix "-alpha", e.g., "2.5.0".
  • Update the value of the release variable in docs/format/source/conf.py to the next version with the suffix
    "-alpha", e.g., "2.5.0-alpha".
  • Add a new section in the release notes docs/format/source/format_release_notes.rst for the new version
    with the date "Upcoming" in parentheses.

@rly rly requested review from h-mayorquin and stephprince May 8, 2025 18:55
@h-mayorquin
Copy link
Contributor

I see.

From #604: the new BaseImage type is a dataset where the data has no constraints on dtype or shape but is required.

Question:
Where in the schema is the constraine that a DataSet must have a required data attribute? I am looking at the NWBData here:

datasets:
- neurodata_type_def: NWBData
neurodata_type_inc: Data
doc: An abstract data type for a dataset.

And Data in the common schema:

https://github.com/hdmf-dev/hdmf-common-schema/blob/5b4cbb31dbafcff51ca70bf218f464b186568151/common/base.yaml#L2-L4

And I am not sure where, when and how is the constrained declared.

@rly
Copy link
Contributor Author

rly commented May 8, 2025

Because the type is a dataset spec and not a group spec, the type automatically has a required data element as well as dtype and shape as top-level keys.

@rly
Copy link
Contributor Author

rly commented May 8, 2025

A different way to say this is that a dataset spec defines a data array

@h-mayorquin
Copy link
Contributor

Thanks, would it be fair that is enforced by the schema language?

It is just that I don't see anywhere where this is written explicitly:

https://hdmf-schema-language.readthedocs.io/en/latest/description.html#sec-dataset-spec

Not questioning this of course, just wondering if we should better document this.

h-mayorquin
h-mayorquin previously approved these changes May 8, 2025
Copy link
Contributor

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving this forward. I still think that file_path is a better name for this but this should work.

I guess another option would to leave data as an empty string and still use the attribute somehow? but maybe that has problems as well and some redundancy.

@rly
Copy link
Contributor Author

rly commented May 8, 2025

would it be fair that is enforced by the schema language?

What do you mean by this?

I agree - we could be more clear in the schema docs. I created hdmf-dev/hdmf-schema-language#46 to track that improvement.

I guess another option would to leave data as an empty string and still use the attribute somehow? but maybe that has problems as well and some redundancy.

Yes that is an option but it feels a bit redundant / messy.

@rly rly requested a review from bendichter May 8, 2025 19:30
@h-mayorquin
Copy link
Contributor

Would it be fair to say that this is enforced by the schema language?

What do you mean by this?

I might be a bit confused, but I think what I'm trying to say is the following:

Here:

attributes:
- name: description
dtype: text
doc: Description of the image.
required: false

the schema enforces that there must be a name attribute.

And here:

datasets:
- name: data
dtype: numeric
dims:
- - num_times
- - num_times
- num_channels
- - num_times
- num_channels
- num_samples
shape:

the schema enforces that there must be a dataset named data, including its dtype and shape.

So I think my confusion comes from expecting something like this:
https://github.com/NeurodataWithoutBorders/nwb-schema/blob/5105b0082be2b91aac77b0c635efc9e72b8b6a4d/core/nwb.base.yaml#L1C1-L5
to define a data attribute as well. But in this case, data isn’t actually an attribute—it’s just an (implicit?) property that all datasets have. That’s why I said maybe it's something that’s enforced or declared by the schema language itself.

Yes, that is an option but it feels a bit redundant/messy.

Right, I trust your instinct on this kind of thing.

@rly
Copy link
Contributor Author

rly commented May 8, 2025

I see. Yes, this is enforced or declared by the schema language itself.

In the same way that this yaml within a group spec declares a dataset named "data" with a specific dtype and shape,

datasets:
- name: data
dtype: numeric
dims:
- - num_times
- - num_times
- num_channels
- - num_times
- num_channels
- num_samples
shape:

This yaml declares a dataset without a name but with a neurodata type "BaseImage" with default dtype and shape:

datasets:
- neurodata_type_def: BaseImage
  neurodata_type_inc: NWBData
  doc: An abstract base type for image data. Parent type for Image and ExternalImage types.
  attributes:
  - name: description
    dtype: text
    doc: Description of the image.
    required: false

Dataset data types are not used often (mostly Image and VectorData subtypes). I admit that how they work sometimes confuses me too.

Co-authored-by: Steph Prince <[email protected]>
@rly rly merged commit 3299a2a into dev May 8, 2025
3 of 4 checks passed
@rly rly deleted the fix-external-image branch May 8, 2025 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants