Skip to content
This repository was archived by the owner on Nov 12, 2024. It is now read-only.
This repository was archived by the owner on Nov 12, 2024. It is now read-only.

Validate numerical output produced by individual data sources #453

@geening

Description

@geening

As far as data source processing goes, we currently test each component of a DataPipeline object.

And we have a dry run in https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/main/src/test/test_source_run.py to make sure, for each individual data source, that there is at least one output whose location key matches a defined regex.

But we do not validate that individual extensions of DataSource (stored in src/pipelines//.py) actually produce the proper numerical output for particular inputs.

I propose unit testing the parse_dataframes method in each data source. To make this easier, perhaps we could have a framework that accepts input and output dataframes as CSV files to make them easier to specify.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions