Automatic sanity check of data, flagging out-of-range or suspiciously large changes

Propose a system that on each run of the pipeline would sanity check data for values that are out of a reasonable range, or with a suspiciously large change from one run of the pipeline to the next.  Ideally checks would apply to data at all stages along the pipeline -- input sources, intermediate data, as well as generated data (output cell indexed by table/variable/key/date) -- but we could start by implementing where this is easiest.  The results would be reported in a pipeline status report and/or stored (either appending to a log, or in a more structured format or database) for future reference (for instance, when suspicious data is manually discovered, one could look to see when it was introduced).

Some errors that have come up that would likely be caught by such a system:
Regions with confirmed cases > population
Regions with area > area of earth



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatic sanity check of data, flagging out-of-range or suspiciously large changes #452

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Automatic sanity check of data, flagging out-of-range or suspiciously large changes #452

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions