Skip to content

Commit ad3cb6f

Browse files
patricktnastrmudambistevebachmeier
authored andcommitted
Allow Age Group reconciliation between disparate dataframes (#38)
* Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * add shunt for other comparison methods * basic untested sketch * add age group draft * actually raise * add tests * remove repr * remove some unnecessary stuff * Revert "basic untested sketch" This reverts commit b3aae1f. * Revert "add shunt for other comparison methods" This reverts commit 06d6792. * rebin magic * lint * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * add default * get initial age bins from artifact * create age schemas in interface * rename age group col * refactor test * do rebin with unstack and dot * fix age group dunders * try to load from artifact first * refactor alignment * lint * refactor * cleanup * lint * consolidate naming * cleanup * remove check given inconvenient dataframe * clean up interface tests * update comment * stub tests * remove configurable age groups * refactor dataframe handling * add to_dataframe * replace with format_Dataframe * special case pop.age_bins * linting typing * add more unit tests for age groups * remove hard-coding of column names * clean up age group tests * add test add age groups * remove align datasets * typing linting * add fixtures * test calcs * add age bins to artifact * separate out vivarium deps into separate requirements * refactor methods to be functions * add logging statement * add nonstandard artifact data * add comments about standard cases * lint * change fn mocks * improve docstrings * add to test age docstrings * add transformation log statement * fix typo * local import * change mocking --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com>
1 parent 1d05518 commit ad3cb6f

File tree

12 files changed

+1038
-16
lines changed

12 files changed

+1038
-16
lines changed

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,5 +41,6 @@ module = [
4141
"py._path.local",
4242
"scipy.*",
4343
# "sklearn.*",
44+
"vivarium_inputs.*",
4445
]
4546
ignore_missing_imports = true

setup.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,13 +47,18 @@
4747
"vivarium_dependencies[pandas,numpy,pyyaml,scipy,click,tables,loguru,networkx]",
4848
"vivarium_build_utils>=2.0.1,<3.0.0",
4949
"pyarrow",
50-
"vivarium",
5150
# Type stubs
5251
"types-setuptools",
5352
]
5453

5554
setup_requires = ["setuptools_scm"]
5655

56+
validation_requirements = [
57+
"vivarium",
58+
"vivarium-inputs",
59+
"pandera",
60+
]
61+
5762
interactive_requirements = [
5863
"vivarium_dependencies[interactive]",
5964
]
@@ -108,10 +113,12 @@
108113
"docs": doc_requirements,
109114
"test": test_requirements,
110115
"interactive": interactive_requirements,
116+
"validation": validation_requirements,
111117
"dev": doc_requirements
112118
+ test_requirements
113119
+ interactive_requirements
114-
+ lint_requirements,
120+
+ lint_requirements
121+
+ validation_requirements,
115122
},
116123
zip_safe=False,
117124
use_scm_version={

src/vivarium_testing_utils/automated_validation/data_loader.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@ def from_str(cls, source: str) -> DataSource:
3131
raise ValueError(f"Source {source} not recognized. Must be one of {DataSource}")
3232

3333

34+
NONSTANDARD_ARTIFACT_KEYS = {"population.age_bins"}
35+
36+
3437
class DataLoader:
3538
def __init__(self, sim_output_dir: Path, cache_size_mb: int = 1000):
3639
self._sim_output_dir = sim_output_dir
@@ -74,6 +77,9 @@ def upload_custom_data(self, dataset_key: str, data: pd.DataFrame) -> None:
7477

7578
def _load_from_source(self, dataset_key: str, source: DataSource) -> pd.DataFrame:
7679
"""Load the data from the given source via the loader mapping."""
80+
if source == DataSource.ARTIFACT and dataset_key in NONSTANDARD_ARTIFACT_KEYS:
81+
# Load nonstandard artifact keys from the artifact
82+
return self._load_nonstandard_artifact(dataset_key)
7783
return self._loader_mapping[source](dataset_key)
7884

7985
def _add_to_cache(self, dataset_key: str, source: DataSource, data: pd.DataFrame) -> None:
@@ -115,9 +121,16 @@ def _load_artifact(results_dir: Path) -> Artifact:
115121
]["artifact_path"]
116122
return Artifact(artifact_path)
117123

124+
def _load_nonstandard_artifact(self, dataset_key: str) -> pd.DataFrame:
125+
"""Load artifact data for nonstandard (e.g. not draw or single numeric) keys."""
126+
data: pd.DataFrame = self._artifact.load(dataset_key)
127+
self._artifact.clear_cache()
128+
return data
129+
118130
@check_io(out=SingleNumericColumn)
119131
def _load_from_artifact(self, dataset_key: str) -> pd.DataFrame:
120-
data = self._artifact.load(dataset_key)
132+
"""Load data directly from artifact, assuming correctly formatted data."""
133+
data: pd.DataFrame = self._artifact.load(dataset_key)
121134
self._artifact.clear_cache()
122135
return clean_artifact_data(dataset_key, data)
123136

0 commit comments

Comments
 (0)