Skip to content

Where should validation be done, in the Validator or in the construction of ChemKED/DataPoint instances? #103

@bryanwweber

Description

@bryanwweber

I'm working on adding the time-histories feature now, and I've realized that we do validation all over the place. For instance, we check that the units are appropriate in the Validator class

    def _validate_isvalid_unit(self, isvalid_unit, field, value):
        """Checks for appropriate units using Pint unit registry.
        Args:
            isvalid_unit (`bool`): flag from schema indicating units to be checked.
            field (`str`): property associated with units in question.
            value (`dict`): dictionary of values from file associated with this property.
        The rule's arguments are validated against this schema:
            {'isvalid_unit': {'type': 'bool'}, 'field': {'type': 'str'},
             'value': {'type': 'dict'}}
        """
        quantity = 1.0 * units(value['units'])
        try:
            quantity.to(property_units[field])
        except pint.DimensionalityError:
            self._error(field, 'incompatible units; should be consistent '
                        'with ' + property_units[field]
                        )

but we check that all the uncertainty fields are present in the DataPoint class:

raise ValueError('Either "uncertainty" or "upper-uncertainty" and '
                 '"lower-uncertainty" need to be specified.')

Should we try to do all of the validation in the Validator class and assume that input is well-formed coming into the e.g., DataPoints class? Should we have this (sort of duplicate) validation?

Relatedly, we also do a bunch of checks for e.g., allowed values and so forth, which are really tests of Cerberus. Should we keep these, or rely on Cerberus to be correct?

I'm wondering about this from a few perspectives. One is that as the test suite gets bigger, it's taking longer to run (I'm up over a minute and a half right now). Another is that all these checks all over the place make it harder to reason about what the code is doing. Finally, these kind of checks "require" small tests to hit the other half of the branch (to keep 100% coverage) that don't really add much information... all they say is we can write a test to exercise the other half of the branch, not whether actual user behavior is going to trigger the other half properly.

I tagged this discussion because I think we should let this simmer for a while and think/talk about what to do, if anything 😄

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions