Where should validation be done, in the Validator or in the construction of ChemKED/DataPoint instances?

I'm working on adding the `time-histories` feature now, and I've realized that we do validation all over the place. For instance, we check that the units are appropriate in the Validator class

```python
    def _validate_isvalid_unit(self, isvalid_unit, field, value):
        """Checks for appropriate units using Pint unit registry.
        Args:
            isvalid_unit (`bool`): flag from schema indicating units to be checked.
            field (`str`): property associated with units in question.
            value (`dict`): dictionary of values from file associated with this property.
        The rule's arguments are validated against this schema:
            {'isvalid_unit': {'type': 'bool'}, 'field': {'type': 'str'},
             'value': {'type': 'dict'}}
        """
        quantity = 1.0 * units(value['units'])
        try:
            quantity.to(property_units[field])
        except pint.DimensionalityError:
            self._error(field, 'incompatible units; should be consistent '
                        'with ' + property_units[field]
                        )
```

but we check that all the uncertainty fields are present in the DataPoint class:

```python
raise ValueError('Either "uncertainty" or "upper-uncertainty" and '
                 '"lower-uncertainty" need to be specified.')
```

Should we try to do all of the validation in the `Validator` class and assume that input is well-formed coming into the e.g., `DataPoints` class? Should we have this (sort of duplicate) validation?

Relatedly, we also do a bunch of checks for e.g., allowed values and so forth, which are really tests of Cerberus. Should we keep these, or rely on Cerberus to be correct?

I'm wondering about this from a few perspectives. One is that as the test suite gets bigger, it's taking longer to run (I'm up over a minute and a half right now). Another is that all these checks all over the place make it harder to reason about what the code is doing. Finally, these kind of checks "require" small tests to hit the other half of the branch (to keep 100% coverage) that don't really add much information... all they say is we can write a test to exercise the other half of the branch, not whether actual user behavior is going to trigger the other half properly.

I tagged this discussion because I think we should let this simmer for a while and think/talk about what to do, if anything 😄 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Where should validation be done, in the Validator or in the construction of ChemKED/DataPoint instances? #103

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Where should validation be done, in the Validator or in the construction of ChemKED/DataPoint instances? #103

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions