Skip to content

[Discussion] How should APIs make use of default_value? #45

@rly

Description

@rly

My current interpretation of default_value in the schema language is that, when reading data, if there is no value provided for a particular optional field, then the API or person reading the data should set the field to the default_value.

However, during the HDMF build process, if the user (data writer) does not provide a value for a particular field, HDMF sets the field value to the default_value. As a result, all fields with a default_value will always be written with a value. On read, if a field with a default_value has no value, I believe HDMF does not set the value to the default_value.

The HDMF class generator and the custom API classes for PyNWB also commonly set the default value for arguments in the init docval to the default value in the schema.

I believe MatNWB sets the field value to the default_value on initialization of a type.

As a result, a data reuser cannot tell the difference between the field value being explicitly set to the default value vs it not being explicitly set. I think there is value in differentiating between the two. If it was automatically set by the schema and API, as a data reuser, then I do not 100% trust that the value is correct - the data writer may not have understood the documentation fully. I think this can happen easily and has happened for data with fixed values, e.g., NWB electrical series data not being stored in volts.

LinkML's schema language supports "ifabsent" which I believe follows my interpretation of default_value. This is different from how Pydantic handles default values, but that

There are a handful of places where default_value is used in the NWB schema (none in the HDMF Common Schema): https://github.com/search?q=repo%3ANeurodataWithoutBorders%2Fnwb-schema%20default_value&type=code . NWB extensions may also use default_value. There are a couple places where a TimeSeries.unit is required and has a default value, which is inconsistent with my above interpretation.

If we change the APIs to not write the default_value and to use the default_value on read, then the API user still does not know whether no value was explicitly set in the data unless they manually inspect the data. So changing the API does not fully help the situation. But at least there is a way to differentiate the explicit and implicit cases.

If we eventually want to use LinkML as a schema language, then we will need to resolve this inconsistency between how the APIs currently handle default_value and how LinkML defines default values using ifabsent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    category: proposaldiscussion of proposed enhancements or new featurespriority: lowalternative solution already working and/or relevant to only specific user(s)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions