Load data from simulation#23
Conversation
…/feature/mic-5908-dataloader
|
|
||
| @pytest.fixture | ||
| def sim_result_dir(): | ||
| return "tests/automated_validation/data/sim_outputs" |
There was a problem hiding this comment.
Would using something like Path(file).resolve().parent be safer here?
rmudambi
left a comment
There was a problem hiding this comment.
Looks really good. Just had a few questions on the margins.
| self.sim_output_dir = self.results_dir / "results" | ||
| self.cache_size_mb = cache_size_mb | ||
| self.raw_datasets = LayeredConfigTree() | ||
| self.raw_datasets = LayeredConfigTree( |
There was a problem hiding this comment.
Why do we want this to be a LayeredConfigTree rather than just a dict? I see we're chaining .get() calls in a test. Was that the reason?
There was a problem hiding this comment.
I think i got a suggestion to use LCT for the 'dot' key access. There's no fundamental reason not to use a dict
| self.cache_size_mb = cache_size_mb | ||
| self.raw_datasets = LayeredConfigTree() | ||
| self.raw_datasets = LayeredConfigTree( | ||
| {"sim": {}, "gbd": {}, "artifact": {}, "custom": {}} |
There was a problem hiding this comment.
What do you imagine custom representing? I'm not sure how we could handle custom data unless we had bespoke transformation functions for them.
There was a problem hiding this comment.
I was thinking users could just essentially upload a dataframe satisfying some minimal formatting requirements (i.e., has a 'value' column).
Yes, probably they would then have to create their own Metric Calculation function or something to tell the VC what to do with it. I don't think I have tickets for that. maybe the best thing to do is ticket those separately and call them lower priority?
There was a problem hiding this comment.
I think that's a good way to handle it
|
|
||
| def sim_outputs(self) -> list[str]: | ||
| raise NotImplementedError | ||
| def get_dataset(self, dataset_key: str, source: str) -> pd.DataFrame: |
There was a problem hiding this comment.
Should source be a string or an enum-like object? We only accept a few precise values, and I don't think the plan is for the DataLoader to be exposed to the end-user, so we don't have to worry about it being annoying for them to have to import and use these objects.
There was a problem hiding this comment.
The complication here is that the user does actually specify the data sources via the interface (when they decide what test data and reference data to compare against), and they should be able to pass a string there.
The strategy I took is to still define the data sources through ENUMs and just check against the string in get_datasets(). I'm not sure if there's an easier way to do the string check, I'm not super familiar with using enums.
There was a problem hiding this comment.
But the user will only call context.add_comparison() which can have a string argument to define the types of data being compared. This method will be hidden from them, so it seems it should be able to take the enum as input?
There was a problem hiding this comment.
Right, but then I think you have to import DataSource into Context and pass the string through DataSource there before passing it into this method. It seemed to me like encapsulating all that into DataLoader / dataloader.py was preferable to keeping the signature consistent, but maybe we'll just need to use this enum in multiple places anyway and it won't matter so much.
|
|
||
| def artifact_keys(self) -> list[str]: | ||
| raise NotImplementedError | ||
| def load_from_source(self, dataset_key: str, source: str) -> None: |
There was a problem hiding this comment.
As is, this will throw a KeyError if an unexpected source is provided.
| """Load the data from the given source via the loader mapping.""" | ||
| return self.loader_mapping[source](dataset_key) | ||
|
|
||
| def add_to_datasets(self, dataset_key: str, source: str, data: pd.DataFrame) -> None: |
There was a problem hiding this comment.
Should this be a private method?
| """Load the data from the simulation output directory and set the non-value columns as indices.""" | ||
| sim_data = pd.read_parquet(self.sim_output_dir / f"{dataset_key}.parquet") | ||
| if "value" not in sim_data.columns: | ||
| raise ValueError(f"Value column not found in {dataset_key}.parquet") |
There was a problem hiding this comment.
Nit: this error message makes it look like a generic value column is missing, when in fact it is the column "value".
| from vivarium_testing_utils.automated_validation.data_loader import DataLoader | ||
|
|
||
|
|
||
| def test_get_sim_outputs(sim_result_dir): |
There was a problem hiding this comment.
Since we're writing new code, can you include type hints?
| "transition_count_cause", | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
If we stick with source being defined as a string, we need a test where a bad source string is provided.
|
|
||
| def sim_outputs(self) -> list[str]: | ||
| raise NotImplementedError | ||
| def get_dataset(self, dataset_key: str, source: str) -> pd.DataFrame: |
There was a problem hiding this comment.
But the user will only call context.add_comparison() which can have a string argument to define the types of data being compared. This method will be hidden from them, so it seems it should be able to take the enum as input?
|
|
||
|
|
||
| def test_get_dataset(sim_result_dir): | ||
| def test_get_dataset_bad_source(sim_result_dir: Path) -> None: |
There was a problem hiding this comment.
If you take my suggestion above, this error would be thrown by context.add_comparison()
There was a problem hiding this comment.
I did this and added a stub test for later, since add_comparison() isn't implemented yet.
* first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests
* Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * add shunt for other comparison methods * basic untested sketch * add age group draft * actually raise * add tests * remove repr * remove some unnecessary stuff * Revert "basic untested sketch" This reverts commit b3aae1f. * Revert "add shunt for other comparison methods" This reverts commit 06d6792. * rebin magic * lint * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * add default * get initial age bins from artifact * create age schemas in interface * rename age group col * refactor test * do rebin with unstack and dot * fix age group dunders * try to load from artifact first * refactor alignment * lint * refactor * cleanup * lint * consolidate naming * cleanup * remove check given inconvenient dataframe * clean up interface tests * update comment * stub tests * remove configurable age groups * refactor dataframe handling * add to_dataframe * replace with format_Dataframe * special case pop.age_bins * linting typing * add more unit tests for age groups * remove hard-coding of column names * clean up age group tests * add test add age groups * remove align datasets * typing linting * add fixtures * test calcs * add age bins to artifact * separate out vivarium deps into separate requirements * refactor methods to be functions * add logging statement * add nonstandard artifact data * add comments about standard cases * lint * change fn mocks * improve docstrings * add to test age docstrings * add transformation log statement * fix typo * local import * change mocking --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com>
* first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests
* Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * add shunt for other comparison methods * basic untested sketch * add age group draft * actually raise * add tests * remove repr * remove some unnecessary stuff * Revert "basic untested sketch" This reverts commit b3aae1f. * Revert "add shunt for other comparison methods" This reverts commit 06d6792. * rebin magic * lint * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * add default * get initial age bins from artifact * create age schemas in interface * rename age group col * refactor test * do rebin with unstack and dot * fix age group dunders * try to load from artifact first * refactor alignment * lint * refactor * cleanup * lint * consolidate naming * cleanup * remove check given inconvenient dataframe * clean up interface tests * update comment * stub tests * remove configurable age groups * refactor dataframe handling * add to_dataframe * replace with format_Dataframe * special case pop.age_bins * linting typing * add more unit tests for age groups * remove hard-coding of column names * clean up age group tests * add test add age groups * remove align datasets * typing linting * add fixtures * test calcs * add age bins to artifact * separate out vivarium deps into separate requirements * refactor methods to be functions * add logging statement * add nonstandard artifact data * add comments about standard cases * lint * change fn mocks * improve docstrings * add to test age docstrings * add transformation log statement * fix typo * local import * change mocking --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com>
* Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * Allow Age Group reconciliation between disparate dataframes (#38) * Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * add shunt for other comparison methods * basic untested sketch * add age group draft * actually raise * add tests * remove repr * remove some unnecessary stuff * Revert "basic untested sketch" This reverts commit b3aae1f. * Revert "add shunt for other comparison methods" This reverts commit 06d6792. * rebin magic * lint * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * add default * get initial age bins from artifact * create age schemas in interface * rename age group col * refactor test * do rebin with unstack and dot * fix age group dunders * try to load from artifact first * refactor alignment * lint * refactor * cleanup * lint * consolidate naming * cleanup * remove check given inconvenient dataframe * clean up interface tests * update comment * stub tests * remove configurable age groups * refactor dataframe handling * add to_dataframe * replace with format_Dataframe * special case pop.age_bins * linting typing * add more unit tests for age groups * remove hard-coding of column names * clean up age group tests * add test add age groups * remove align datasets * typing linting * add fixtures * test calcs * add age bins to artifact * separate out vivarium deps into separate requirements * refactor methods to be functions * add logging statement * add nonstandard artifact data * add comments about standard cases * lint * change fn mocks * improve docstrings * add to test age docstrings * add transformation log statement * fix typo * local import * change mocking --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Comparison Public Methods (#40) * basic untested sketch * bug fix / rename * remove ref to age groups * add tests back * change to underscore, no space * add pandas syling * fix API * docstring * add mean agg method * fix bug in aggregation * change name to diff * refactor formatting to viz utils * hack out tests with AI * make sure indexes are aligned * move verify down * rename method * flesh out comparison tests * fix dataframe utils tests * fix comparison test * remove styler * lint * docstrings comparison * df utils docstrings * df utils docstrings * test docstrings * deal with zero denominator * drop age range cols that mess us up * fix tests for previous commit fixes * fixes * undo agg * more agg undo * add notimplemented * dont have extra things in artifact * only rebin on count data * fix public API * add singular indices function * API change with TODO * align datasets with more complicated logic * fix mocking * lint * revert new nonstandard key * add back measure key * adjust metadata logic * adjust df utils * adjust joining of datasets * abs error * abs error test * set index of metadata * typing * refactor draw formatting and add tests * lint * add more test coverage * mypy * add ignores for coverage report * simplify a test * literl typing "all" * fix docstring * make required keys tuple * move metadata parser to comparison * metadata to comparison * add docstring * add err * change error type * don't drop nans * renaming * rename abs percent err * finish move of get_dataset * fix tests around NaN * lint * fix tests * fix sorting * add pytest-check * Fix mypy failures (#42) * remove ignore for comparison * add ignore for plot utils * fix err ignore * Don't sort in aggregation (#41) * add age schema to comparison and sort by age bins * underscore age schema * make sorting method more general * move tests * use subset logic * remove age schema from comparison * remove sorting function * remove unused * add todo * docstring * add unit test * address PR comments * Line Plots (#43) * basic untested sketch * bug fix / rename * remove ref to age groups * add tests back * change to underscore, no space * add pandas syling * fix API * docstring * add mean agg method * fix bug in aggregation * change name to diff * refactor formatting to viz utils * hack out tests with AI * make sure indexes are aligned * move verify down * rename method * flesh out comparison tests * fix dataframe utils tests * fix comparison test * remove styler * lint * docstrings comparison * df utils docstrings * df utils docstrings * test docstrings * rough cut * deal with zero denominator * drop age range cols that mess us up * fix tests for previous commit fixes * fixes * undo agg * more agg undo * add notimplemented * dont have extra things in artifact * only rebin on count data * fix public API * add singular indices function * API change with TODO * align datasets with more complicated logic * fix mocking * lint * revert new nonstandard key * add back measure key * adjust metadata logic * adjust df utils * adjust joining of datasets * abs error * abs error test * set index of metadata * typing * refactor draw formatting and add tests * lint * add more test coverage * mypy * add ignores for coverage report * simplify a test * move plot_utils * adjust API * Revert "Merge branch 'pnast/feature/mic-5928-comp-pub' into feature/pnast/mic-5929-plotting" This reverts commit f47d7c6, reversing changes made to 3fb93d2. * use percent interval * adjust data formatting * fix test with rename move * pass in optional age schema and sort * test sorting * try plot with unconditionalized indices * draft * add not subplot * refactor * fix conditionalization * change the legend placement * draft new tests using copilot * remove some bits of obviated PR * remove ref to age schema * remove unused bits * fix typing * add docstring * consolidate * consolidate tests * rename fn * isort * add more specific asserts * reorganize tests * fix typing and lint * cleanup * remove some comments * lint * add seaborn req * fix docstrings * underscore privates * make sure no weird kws * change error type * lint * fix underlines in test fns * Aggregate instead of check and drop during formatting (#47) * remove redundant col fn and just marginalize instead * rename * clean up docstring * expand filter functionality * Refactor Test fixtures to write to tmp_dir (#45) * refactor * change pa back * lint * make fixtures and tmpdir call same fn * typing * add back check io * Create total person time dataset (#46) * make a simdataformatter for total person time * add totalpersontime * light formatting refactor * create derived dataset * start to work on tests * refactor * change pa back * lint * make fixtures and tmpdir call same fn * typing * make mocking more precise * fix bug in set logic * set custom data back to normal * remove unused * adjsut test * compress logic * add test coverage for person time dataset * formatting * add comment * refactor convert to total as helper * make formatter simple subclass * add tests * add fixture for total pt data * fix test * fix fixtures * fix awful formatting * remove unused entity_type * address comments * lint * Mortality Measures (#48) * make a simdataformatter for total person time * add totalpersontime * light formatting refactor * create derived dataset * start to work on tests * refactor * change pa back * lint * make fixtures and tmpdir call same fn * typing * make mocking more precise * fix bug in set logic * set custom data back to normal * remove unused * adjsut test * compress logic * add test coverage for person time dataset * formatting * add comment * add formatter and measures * make death formatter correct * put in correct tests for formatter * adjust deaths fixture * first pass mort tests * bugfix * fix tests * refactor convert to total as helper * make formatter simple subclass * add tests * add fixture for total pt data * use total pt * make ACMR just a particular cause-specific mortality rate * remove redundant col fn and just marginalize instead * remove bespoke format fn * change to unused cols * fix unused column rules * update syntax * remove None * fix test * fix fixtures * fix awful formatting * remove unused entity_type * add back schema validation * address comments * lint * make value nullable * remove slop comments * fix decorator order (#51) * Refactor ratiodata to just two dataframes (#50) * refactor to tuple * split a bit more * make comparisons do split as well * make ratio not need single dataframe * remove some checks * remove unused strict criteria * add check * adjust docstring * remove ratiodata * fix bug * fix data error and add backtest * rename * make sure we have numerator and denominator keys * make a simplification * rearrange * remove check * refactor to dict * pass dict instead * fix typing * remove unneeded copy * remove unneeded copy * fix type error * Fix typing (#54) * add yaml types * pin vivarium version * Risk Exposure (#49) * first draft * remove unused import * fix how we total up in-category * add test * add entity type to vague-ified names * add formatter test * fix tests * replace_align_datasets method * actually remove the unused fn * add typing * fix format_dataset * lint * fix fixtures * add yaml types * import from collections.abc * fix tests in line with new format * pin vivarium version * Correctly handle Scenario indexes and draw/seed indexes (#53) * refactor to tuple * split a bit more * make comparisons do split as well * make ratio not need single dataframe * remove some checks * remove unused strict criteria * add check * adjust docstring * remove ratiodata * fix bug * fix data error and add backtest * rename * make sure we have numerator and denominator keys * make a simplification * rearrange * remove check * refactor to dict * pass dict instead * fix typing * remove unneeded copy * fix one-off warnings * move age group resolution before measure use * adjust how ratio takes out num/denom * remove index alignment * adjust for input draw * adjust plot until for missing data * add scenario cols to comparison (hack) * concat better * remove append source * simplify set logic * add missing index levels * put scenario columns in with a default * put scenario cols in interface * make scenarios more flexible * marginalize within the comparison * put scenarios in comparison plot * add test of new plotutils helper * fix tests for comparison * cleanup * move filter scenarios to method * simplify init * reverse order * add filter_data tests * remove append source * add difference by set * fix typing * add fill with placeholder * add init unit test * add test for raise * add tests for fill with placeholder * fix tests * add calculation tests * add plot utils tests * adjust test comparison * add reorder levels * remove some too-clever helper functions * add yaml types * pin vivarium version * Population Structure (#52) * add input draw to art if needed * partse the measure key better * add broken pop structre * filter baseline * working with several hacks * document hacks * refactor to tuple * split a bit more * make comparisons do split as well * make ratio not need single dataframe * remove some checks * remove unused strict criteria * add check * adjust docstring * remove ratiodata * fix bug * fix data error and add backtest * rename * make sure we have numerator and denominator keys * make a simplification * rearrange * remove check * refactor to dict * pass dict instead * fix typing * utilize non-indentical index * remove unneeded copy * fix plot utils test * move input draw resolution to plot utils * fix warnings * fix broken tests * fix one-off warnings * move age group resolution before measure use * adjust how ratio takes out num/denom * remove index alignment * adjust for input draw * adjust plot until for missing data * add scenario cols to comparison (hack) * concat better * remove append source * simplify set logic * add missing index levels * put scenario columns in with a default * put scenario cols in interface * make scenarios more flexible * marginalize within the comparison * put scenarios in comparison plot * add test of new plotutils helper * fix tests for comparison * cleanup * move filter scenarios to method * simplify init * reverse order * add filter_data tests * remove append source * add difference by set * fix typing * add fill with placeholder * add init unit test * add test for raise * add tests for fill with placeholder * fix tests * add calculation tests * add plot utils tests * adjust test comparison * add reorder levels * remove some too-clever helper functions * add yaml types * pin vivarium version * move draw to constants * pull out input draws and random seed to constants * remove unused import * remove unused copy * remove unused * add ratiomeasure check * type hinting * tests in draft form * add tests * add unit tests for get_measure_from_key * type hint * condense expected dfs * add spaces * fix docstrings * rename * add valid format * formatting * Add unique stratification levels to test fixtures (#55) * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * lint * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * Categorical Relative Risks (#57) * gitignore copilot instructions * add categorical RR measure * add interface function * realtive risk scaffolding * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * add new fixtures and categorical rr test * lint * change measure_key to measure_name and add artifact name * add risk state mapping * modify name * fix title * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * revert change to measure key * minor fixes to allow test pass * generalize RatioMeasure to fix typing * remove default ratiomeasure imp * lint * move args * renames * lint * fix typing * weird merge issue * add default for categories * refactor measures for (hopefully) simplification * Update src/vivarium_testing_utils/automated_validation/data_transformation/measures.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Update src/vivarium_testing_utils/automated_validation/data_transformation/measures.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * bugfix --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Refactor Module Imports (#58) * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * lint * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * change function import strategy * lint * lint * rename with "test" * add module names * remove accidental copy * revert interface change * fix typo * Allow any artifact key to be loaded into DataLoader (#60) * add tests (woo TDD!) * unrestrict artifact loading * add to gitignore * lint * Add comparison view that aggregates over draws. (#59) * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * lint * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * change function import strategy * lint * lint * rename with "test" * draft * simplify * add test * missed one * Change docstring * rename and adjust interface fn * don't not dropna * make default sort by "" in abstract * do the sory by default thing in interface instead * condense line * Rename dataset to data in places (#61) * add tests (woo TDD!) * unrestrict artifact loading * add to gitignore * lint * rename things from dataset to data in dataloader and interface * adjust regex * pin pandas above 2.0.0 * re-pin pandas after rebase * lint * remove art --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com>
* first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests
* Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * add shunt for other comparison methods * basic untested sketch * add age group draft * actually raise * add tests * remove repr * remove some unnecessary stuff * Revert "basic untested sketch" This reverts commit b3aae1f. * Revert "add shunt for other comparison methods" This reverts commit 06d6792. * rebin magic * lint * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * add default * get initial age bins from artifact * create age schemas in interface * rename age group col * refactor test * do rebin with unstack and dot * fix age group dunders * try to load from artifact first * refactor alignment * lint * refactor * cleanup * lint * consolidate naming * cleanup * remove check given inconvenient dataframe * clean up interface tests * update comment * stub tests * remove configurable age groups * refactor dataframe handling * add to_dataframe * replace with format_Dataframe * special case pop.age_bins * linting typing * add more unit tests for age groups * remove hard-coding of column names * clean up age group tests * add test add age groups * remove align datasets * typing linting * add fixtures * test calcs * add age bins to artifact * separate out vivarium deps into separate requirements * refactor methods to be functions * add logging statement * add nonstandard artifact data * add comments about standard cases * lint * change fn mocks * improve docstrings * add to test age docstrings * add transformation log statement * fix typo * local import * change mocking --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com>
* Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * Allow Age Group reconciliation between disparate dataframes (#38) * Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * add shunt for other comparison methods * basic untested sketch * add age group draft * actually raise * add tests * remove repr * remove some unnecessary stuff * Revert "basic untested sketch" This reverts commit b3aae1f. * Revert "add shunt for other comparison methods" This reverts commit 06d6792. * rebin magic * lint * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * add default * get initial age bins from artifact * create age schemas in interface * rename age group col * refactor test * do rebin with unstack and dot * fix age group dunders * try to load from artifact first * refactor alignment * lint * refactor * cleanup * lint * consolidate naming * cleanup * remove check given inconvenient dataframe * clean up interface tests * update comment * stub tests * remove configurable age groups * refactor dataframe handling * add to_dataframe * replace with format_Dataframe * special case pop.age_bins * linting typing * add more unit tests for age groups * remove hard-coding of column names * clean up age group tests * add test add age groups * remove align datasets * typing linting * add fixtures * test calcs * add age bins to artifact * separate out vivarium deps into separate requirements * refactor methods to be functions * add logging statement * add nonstandard artifact data * add comments about standard cases * lint * change fn mocks * improve docstrings * add to test age docstrings * add transformation log statement * fix typo * local import * change mocking --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Comparison Public Methods (#40) * basic untested sketch * bug fix / rename * remove ref to age groups * add tests back * change to underscore, no space * add pandas syling * fix API * docstring * add mean agg method * fix bug in aggregation * change name to diff * refactor formatting to viz utils * hack out tests with AI * make sure indexes are aligned * move verify down * rename method * flesh out comparison tests * fix dataframe utils tests * fix comparison test * remove styler * lint * docstrings comparison * df utils docstrings * df utils docstrings * test docstrings * deal with zero denominator * drop age range cols that mess us up * fix tests for previous commit fixes * fixes * undo agg * more agg undo * add notimplemented * dont have extra things in artifact * only rebin on count data * fix public API * add singular indices function * API change with TODO * align datasets with more complicated logic * fix mocking * lint * revert new nonstandard key * add back measure key * adjust metadata logic * adjust df utils * adjust joining of datasets * abs error * abs error test * set index of metadata * typing * refactor draw formatting and add tests * lint * add more test coverage * mypy * add ignores for coverage report * simplify a test * literl typing "all" * fix docstring * make required keys tuple * move metadata parser to comparison * metadata to comparison * add docstring * add err * change error type * don't drop nans * renaming * rename abs percent err * finish move of get_dataset * fix tests around NaN * lint * fix tests * fix sorting * add pytest-check * Fix mypy failures (#42) * remove ignore for comparison * add ignore for plot utils * fix err ignore * Don't sort in aggregation (#41) * add age schema to comparison and sort by age bins * underscore age schema * make sorting method more general * move tests * use subset logic * remove age schema from comparison * remove sorting function * remove unused * add todo * docstring * add unit test * address PR comments * Line Plots (#43) * basic untested sketch * bug fix / rename * remove ref to age groups * add tests back * change to underscore, no space * add pandas syling * fix API * docstring * add mean agg method * fix bug in aggregation * change name to diff * refactor formatting to viz utils * hack out tests with AI * make sure indexes are aligned * move verify down * rename method * flesh out comparison tests * fix dataframe utils tests * fix comparison test * remove styler * lint * docstrings comparison * df utils docstrings * df utils docstrings * test docstrings * rough cut * deal with zero denominator * drop age range cols that mess us up * fix tests for previous commit fixes * fixes * undo agg * more agg undo * add notimplemented * dont have extra things in artifact * only rebin on count data * fix public API * add singular indices function * API change with TODO * align datasets with more complicated logic * fix mocking * lint * revert new nonstandard key * add back measure key * adjust metadata logic * adjust df utils * adjust joining of datasets * abs error * abs error test * set index of metadata * typing * refactor draw formatting and add tests * lint * add more test coverage * mypy * add ignores for coverage report * simplify a test * move plot_utils * adjust API * Revert "Merge branch 'pnast/feature/mic-5928-comp-pub' into feature/pnast/mic-5929-plotting" This reverts commit f47d7c6, reversing changes made to 3fb93d2. * use percent interval * adjust data formatting * fix test with rename move * pass in optional age schema and sort * test sorting * try plot with unconditionalized indices * draft * add not subplot * refactor * fix conditionalization * change the legend placement * draft new tests using copilot * remove some bits of obviated PR * remove ref to age schema * remove unused bits * fix typing * add docstring * consolidate * consolidate tests * rename fn * isort * add more specific asserts * reorganize tests * fix typing and lint * cleanup * remove some comments * lint * add seaborn req * fix docstrings * underscore privates * make sure no weird kws * change error type * lint * fix underlines in test fns * Aggregate instead of check and drop during formatting (#47) * remove redundant col fn and just marginalize instead * rename * clean up docstring * expand filter functionality * Refactor Test fixtures to write to tmp_dir (#45) * refactor * change pa back * lint * make fixtures and tmpdir call same fn * typing * add back check io * Create total person time dataset (#46) * make a simdataformatter for total person time * add totalpersontime * light formatting refactor * create derived dataset * start to work on tests * refactor * change pa back * lint * make fixtures and tmpdir call same fn * typing * make mocking more precise * fix bug in set logic * set custom data back to normal * remove unused * adjsut test * compress logic * add test coverage for person time dataset * formatting * add comment * refactor convert to total as helper * make formatter simple subclass * add tests * add fixture for total pt data * fix test * fix fixtures * fix awful formatting * remove unused entity_type * address comments * lint * Mortality Measures (#48) * make a simdataformatter for total person time * add totalpersontime * light formatting refactor * create derived dataset * start to work on tests * refactor * change pa back * lint * make fixtures and tmpdir call same fn * typing * make mocking more precise * fix bug in set logic * set custom data back to normal * remove unused * adjsut test * compress logic * add test coverage for person time dataset * formatting * add comment * add formatter and measures * make death formatter correct * put in correct tests for formatter * adjust deaths fixture * first pass mort tests * bugfix * fix tests * refactor convert to total as helper * make formatter simple subclass * add tests * add fixture for total pt data * use total pt * make ACMR just a particular cause-specific mortality rate * remove redundant col fn and just marginalize instead * remove bespoke format fn * change to unused cols * fix unused column rules * update syntax * remove None * fix test * fix fixtures * fix awful formatting * remove unused entity_type * add back schema validation * address comments * lint * make value nullable * remove slop comments * fix decorator order (#51) * Refactor ratiodata to just two dataframes (#50) * refactor to tuple * split a bit more * make comparisons do split as well * make ratio not need single dataframe * remove some checks * remove unused strict criteria * add check * adjust docstring * remove ratiodata * fix bug * fix data error and add backtest * rename * make sure we have numerator and denominator keys * make a simplification * rearrange * remove check * refactor to dict * pass dict instead * fix typing * remove unneeded copy * remove unneeded copy * fix type error * Fix typing (#54) * add yaml types * pin vivarium version * Risk Exposure (#49) * first draft * remove unused import * fix how we total up in-category * add test * add entity type to vague-ified names * add formatter test * fix tests * replace_align_datasets method * actually remove the unused fn * add typing * fix format_dataset * lint * fix fixtures * add yaml types * import from collections.abc * fix tests in line with new format * pin vivarium version * Correctly handle Scenario indexes and draw/seed indexes (#53) * refactor to tuple * split a bit more * make comparisons do split as well * make ratio not need single dataframe * remove some checks * remove unused strict criteria * add check * adjust docstring * remove ratiodata * fix bug * fix data error and add backtest * rename * make sure we have numerator and denominator keys * make a simplification * rearrange * remove check * refactor to dict * pass dict instead * fix typing * remove unneeded copy * fix one-off warnings * move age group resolution before measure use * adjust how ratio takes out num/denom * remove index alignment * adjust for input draw * adjust plot until for missing data * add scenario cols to comparison (hack) * concat better * remove append source * simplify set logic * add missing index levels * put scenario columns in with a default * put scenario cols in interface * make scenarios more flexible * marginalize within the comparison * put scenarios in comparison plot * add test of new plotutils helper * fix tests for comparison * cleanup * move filter scenarios to method * simplify init * reverse order * add filter_data tests * remove append source * add difference by set * fix typing * add fill with placeholder * add init unit test * add test for raise * add tests for fill with placeholder * fix tests * add calculation tests * add plot utils tests * adjust test comparison * add reorder levels * remove some too-clever helper functions * add yaml types * pin vivarium version * Population Structure (#52) * add input draw to art if needed * partse the measure key better * add broken pop structre * filter baseline * working with several hacks * document hacks * refactor to tuple * split a bit more * make comparisons do split as well * make ratio not need single dataframe * remove some checks * remove unused strict criteria * add check * adjust docstring * remove ratiodata * fix bug * fix data error and add backtest * rename * make sure we have numerator and denominator keys * make a simplification * rearrange * remove check * refactor to dict * pass dict instead * fix typing * utilize non-indentical index * remove unneeded copy * fix plot utils test * move input draw resolution to plot utils * fix warnings * fix broken tests * fix one-off warnings * move age group resolution before measure use * adjust how ratio takes out num/denom * remove index alignment * adjust for input draw * adjust plot until for missing data * add scenario cols to comparison (hack) * concat better * remove append source * simplify set logic * add missing index levels * put scenario columns in with a default * put scenario cols in interface * make scenarios more flexible * marginalize within the comparison * put scenarios in comparison plot * add test of new plotutils helper * fix tests for comparison * cleanup * move filter scenarios to method * simplify init * reverse order * add filter_data tests * remove append source * add difference by set * fix typing * add fill with placeholder * add init unit test * add test for raise * add tests for fill with placeholder * fix tests * add calculation tests * add plot utils tests * adjust test comparison * add reorder levels * remove some too-clever helper functions * add yaml types * pin vivarium version * move draw to constants * pull out input draws and random seed to constants * remove unused import * remove unused copy * remove unused * add ratiomeasure check * type hinting * tests in draft form * add tests * add unit tests for get_measure_from_key * type hint * condense expected dfs * add spaces * fix docstrings * rename * add valid format * formatting * Add unique stratification levels to test fixtures (#55) * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * lint * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * Categorical Relative Risks (#57) * gitignore copilot instructions * add categorical RR measure * add interface function * realtive risk scaffolding * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * add new fixtures and categorical rr test * lint * change measure_key to measure_name and add artifact name * add risk state mapping * modify name * fix title * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * revert change to measure key * minor fixes to allow test pass * generalize RatioMeasure to fix typing * remove default ratiomeasure imp * lint * move args * renames * lint * fix typing * weird merge issue * add default for categories * refactor measures for (hopefully) simplification * Update src/vivarium_testing_utils/automated_validation/data_transformation/measures.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Update src/vivarium_testing_utils/automated_validation/data_transformation/measures.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * bugfix --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Refactor Module Imports (#58) * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * lint * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * change function import strategy * lint * lint * rename with "test" * add module names * remove accidental copy * revert interface change * fix typo * Allow any artifact key to be loaded into DataLoader (#60) * add tests (woo TDD!) * unrestrict artifact loading * add to gitignore * lint * Add comparison view that aggregates over draws. (#59) * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * lint * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * change function import strategy * lint * lint * rename with "test" * draft * simplify * add test * missed one * Change docstring * rename and adjust interface fn * don't not dropna * make default sort by "" in abstract * do the sory by default thing in interface instead * condense line * Rename dataset to data in places (#61) * add tests (woo TDD!) * unrestrict artifact loading * add to gitignore * lint * rename things from dataset to data in dataloader and interface * adjust regex * pin pandas above 2.0.0 * re-pin pandas after rebase * lint * remove art --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com>
* first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests
* Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * add shunt for other comparison methods * basic untested sketch * add age group draft * actually raise * add tests * remove repr * remove some unnecessary stuff * Revert "basic untested sketch" This reverts commit b3aae1f. * Revert "add shunt for other comparison methods" This reverts commit 06d6792. * rebin magic * lint * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * add default * get initial age bins from artifact * create age schemas in interface * rename age group col * refactor test * do rebin with unstack and dot * fix age group dunders * try to load from artifact first * refactor alignment * lint * refactor * cleanup * lint * consolidate naming * cleanup * remove check given inconvenient dataframe * clean up interface tests * update comment * stub tests * remove configurable age groups * refactor dataframe handling * add to_dataframe * replace with format_Dataframe * special case pop.age_bins * linting typing * add more unit tests for age groups * remove hard-coding of column names * clean up age group tests * add test add age groups * remove align datasets * typing linting * add fixtures * test calcs * add age bins to artifact * separate out vivarium deps into separate requirements * refactor methods to be functions * add logging statement * add nonstandard artifact data * add comments about standard cases * lint * change fn mocks * improve docstrings * add to test age docstrings * add transformation log statement * fix typo * local import * change mocking --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com>
* Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * Allow Age Group reconciliation between disparate dataframes (#38) * Generate stubs for Automated V&V Phase 1 (#22) * first pass at stubbing * lint * refine * Update src/vivarium_testing_utils/automated_validation/interface.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * change pass to notimplementederror --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Load data from simulation (#23) * first pass at stubbing * lint * refine * rudimentary data loading from sim * lint * add tests and test data * adjust to epic * fix a few names * rename back to dataloader * modify tests * lint * use set instead * remove build.yml * use pathlib * lint * make dataloader private * add typing * add enum * make methods private * fix error message * undo add_comparison, which we will fix later * punt DataSource to interface * adjust tests * Load From Artifact (#24) * load key from artifact * minor fixes * add stub tests * add test artifact * lint * remove top level import * private attrs * Add custom data upload (#25) * add custom upload * add series type * adjust error message * fix missing DataSource * move raise test to get dataset * add test for show raw dataset * lint * fix test * copy going into and out of the cache * lint * adjust args * add docstrings to "real" interface methods * don't copy on cache miss * wrap in new public method * add error messages * Basic Transformations (#26) * add initial set of basis transformations * use typevar * lint * add missing typing * remove validation for now based on discussion * delete validation * remove import * change LCT to dicts (#28) * Reorganize Calculations (#30) * add data_transformation folder * add missing folder * stub out pandera types (#31) * Refactor test files (#29) * add code changes * add binary file changes * add artifact parsing * extract DRAW_PREFIX * lint * Add Measures for Incidence, Prevalence, Remission (#27) * add pandera * add basic calc for index alignment and data filtering * stub schemas * add Measures * start interface for measures * rename some schema types * refactor measures * organize * specify FuzzyComparison * add comparison base class * revert to stub * add datasets for sim and artifact * add formatting tests * bugfixes * add tests * make test data formatting consistent * add tests of measure functions * cleanup * transform artifact data * add artifact fixture * remove old test dfs * add tests for comparison * minor fixes * lint * typing * add docstring * typing/cleanup * add docstring * isort * use normal dicts instead of LCT * refactor formatters * address comments * beef up tests * add todos * make ratio inherit from ABC * Add Schemas to Data loading and calculation (#32) * add code changes * add binary file changes * add artifact parsing * add start of schema * do simdata schema * add schemas and schema tests * unstash commit * change to floats * add new checks * new typing * change to simple float coerce * revert calculations * add checks to measures * missed one * Update src/vivarium_testing_utils/automated_validation/data_transformation/data_schema.py Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * change name * add missing test * simplify * ensure index levels are correctly sorted * address artifact data * lint * DRY code --------- Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Refactor pandera to use SchemaModel decorators (#36) * simplest fix * remove some unused * wrap decorator so I don't have to do to_schema() all the time * remove unused imports * add series to dataframe method * add specific check for DrawData * lint * lint again * refactor helpers into utils * remove dupe comment * move future import * add shunt for other comparison methods * basic untested sketch * add age group draft * actually raise * add tests * remove repr * remove some unnecessary stuff * Revert "basic untested sketch" This reverts commit b3aae1f. * Revert "add shunt for other comparison methods" This reverts commit 06d6792. * rebin magic * lint * VTU Mypy (#34) * type hint for calculations * in flux * in progress * do approximately half of typing * removoe more series types * lint * data_loader * type measures.py * fix strats * type test_interface * type test_formatting * type test_data_schema * refactor patch * lint * remove fuzzy checker ignores * "delete unused types.py file" * merge changes from refactor and fix * address comments * remve ref ro 'fuzzy' * move error up * remove trailing commas * change to collection * empty commit * lint * remove unused import * Wrapper for Comparison methods (#37) * add shunt for other comparison methods * basic untested sketch * big fixes * bug fix / rename * Revert "bug fix / rename" This reverts commit 9f048bd. * Revert "big fixes" This reverts commit 4cc09bb. * Revert "basic untested sketch" This reverts commit b3aae1f. * add skipped tests * add typing * add default * get initial age bins from artifact * create age schemas in interface * rename age group col * refactor test * do rebin with unstack and dot * fix age group dunders * try to load from artifact first * refactor alignment * lint * refactor * cleanup * lint * consolidate naming * cleanup * remove check given inconvenient dataframe * clean up interface tests * update comment * stub tests * remove configurable age groups * refactor dataframe handling * add to_dataframe * replace with format_Dataframe * special case pop.age_bins * linting typing * add more unit tests for age groups * remove hard-coding of column names * clean up age group tests * add test add age groups * remove align datasets * typing linting * add fixtures * test calcs * add age bins to artifact * separate out vivarium deps into separate requirements * refactor methods to be functions * add logging statement * add nonstandard artifact data * add comments about standard cases * lint * change fn mocks * improve docstrings * add to test age docstrings * add transformation log statement * fix typo * local import * change mocking --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com> * Comparison Public Methods (#40) * basic untested sketch * bug fix / rename * remove ref to age groups * add tests back * change to underscore, no space * add pandas syling * fix API * docstring * add mean agg method * fix bug in aggregation * change name to diff * refactor formatting to viz utils * hack out tests with AI * make sure indexes are aligned * move verify down * rename method * flesh out comparison tests * fix dataframe utils tests * fix comparison test * remove styler * lint * docstrings comparison * df utils docstrings * df utils docstrings * test docstrings * deal with zero denominator * drop age range cols that mess us up * fix tests for previous commit fixes * fixes * undo agg * more agg undo * add notimplemented * dont have extra things in artifact * only rebin on count data * fix public API * add singular indices function * API change with TODO * align datasets with more complicated logic * fix mocking * lint * revert new nonstandard key * add back measure key * adjust metadata logic * adjust df utils * adjust joining of datasets * abs error * abs error test * set index of metadata * typing * refactor draw formatting and add tests * lint * add more test coverage * mypy * add ignores for coverage report * simplify a test * literl typing "all" * fix docstring * make required keys tuple * move metadata parser to comparison * metadata to comparison * add docstring * add err * change error type * don't drop nans * renaming * rename abs percent err * finish move of get_dataset * fix tests around NaN * lint * fix tests * fix sorting * add pytest-check * Fix mypy failures (#42) * remove ignore for comparison * add ignore for plot utils * fix err ignore * Don't sort in aggregation (#41) * add age schema to comparison and sort by age bins * underscore age schema * make sorting method more general * move tests * use subset logic * remove age schema from comparison * remove sorting function * remove unused * add todo * docstring * add unit test * address PR comments * Line Plots (#43) * basic untested sketch * bug fix / rename * remove ref to age groups * add tests back * change to underscore, no space * add pandas syling * fix API * docstring * add mean agg method * fix bug in aggregation * change name to diff * refactor formatting to viz utils * hack out tests with AI * make sure indexes are aligned * move verify down * rename method * flesh out comparison tests * fix dataframe utils tests * fix comparison test * remove styler * lint * docstrings comparison * df utils docstrings * df utils docstrings * test docstrings * rough cut * deal with zero denominator * drop age range cols that mess us up * fix tests for previous commit fixes * fixes * undo agg * more agg undo * add notimplemented * dont have extra things in artifact * only rebin on count data * fix public API * add singular indices function * API change with TODO * align datasets with more complicated logic * fix mocking * lint * revert new nonstandard key * add back measure key * adjust metadata logic * adjust df utils * adjust joining of datasets * abs error * abs error test * set index of metadata * typing * refactor draw formatting and add tests * lint * add more test coverage * mypy * add ignores for coverage report * simplify a test * move plot_utils * adjust API * Revert "Merge branch 'pnast/feature/mic-5928-comp-pub' into feature/pnast/mic-5929-plotting" This reverts commit f47d7c6, reversing changes made to 3fb93d2. * use percent interval * adjust data formatting * fix test with rename move * pass in optional age schema and sort * test sorting * try plot with unconditionalized indices * draft * add not subplot * refactor * fix conditionalization * change the legend placement * draft new tests using copilot * remove some bits of obviated PR * remove ref to age schema * remove unused bits * fix typing * add docstring * consolidate * consolidate tests * rename fn * isort * add more specific asserts * reorganize tests * fix typing and lint * cleanup * remove some comments * lint * add seaborn req * fix docstrings * underscore privates * make sure no weird kws * change error type * lint * fix underlines in test fns * Aggregate instead of check and drop during formatting (#47) * remove redundant col fn and just marginalize instead * rename * clean up docstring * expand filter functionality * Refactor Test fixtures to write to tmp_dir (#45) * refactor * change pa back * lint * make fixtures and tmpdir call same fn * typing * add back check io * Create total person time dataset (#46) * make a simdataformatter for total person time * add totalpersontime * light formatting refactor * create derived dataset * start to work on tests * refactor * change pa back * lint * make fixtures and tmpdir call same fn * typing * make mocking more precise * fix bug in set logic * set custom data back to normal * remove unused * adjsut test * compress logic * add test coverage for person time dataset * formatting * add comment * refactor convert to total as helper * make formatter simple subclass * add tests * add fixture for total pt data * fix test * fix fixtures * fix awful formatting * remove unused entity_type * address comments * lint * Mortality Measures (#48) * make a simdataformatter for total person time * add totalpersontime * light formatting refactor * create derived dataset * start to work on tests * refactor * change pa back * lint * make fixtures and tmpdir call same fn * typing * make mocking more precise * fix bug in set logic * set custom data back to normal * remove unused * adjsut test * compress logic * add test coverage for person time dataset * formatting * add comment * add formatter and measures * make death formatter correct * put in correct tests for formatter * adjust deaths fixture * first pass mort tests * bugfix * fix tests * refactor convert to total as helper * make formatter simple subclass * add tests * add fixture for total pt data * use total pt * make ACMR just a particular cause-specific mortality rate * remove redundant col fn and just marginalize instead * remove bespoke format fn * change to unused cols * fix unused column rules * update syntax * remove None * fix test * fix fixtures * fix awful formatting * remove unused entity_type * add back schema validation * address comments * lint * make value nullable * remove slop comments * fix decorator order (#51) * Refactor ratiodata to just two dataframes (#50) * refactor to tuple * split a bit more * make comparisons do split as well * make ratio not need single dataframe * remove some checks * remove unused strict criteria * add check * adjust docstring * remove ratiodata * fix bug * fix data error and add backtest * rename * make sure we have numerator and denominator keys * make a simplification * rearrange * remove check * refactor to dict * pass dict instead * fix typing * remove unneeded copy * remove unneeded copy * fix type error * Fix typing (#54) * add yaml types * pin vivarium version * Risk Exposure (#49) * first draft * remove unused import * fix how we total up in-category * add test * add entity type to vague-ified names * add formatter test * fix tests * replace_align_datasets method * actually remove the unused fn * add typing * fix format_dataset * lint * fix fixtures * add yaml types * import from collections.abc * fix tests in line with new format * pin vivarium version * Correctly handle Scenario indexes and draw/seed indexes (#53) * refactor to tuple * split a bit more * make comparisons do split as well * make ratio not need single dataframe * remove some checks * remove unused strict criteria * add check * adjust docstring * remove ratiodata * fix bug * fix data error and add backtest * rename * make sure we have numerator and denominator keys * make a simplification * rearrange * remove check * refactor to dict * pass dict instead * fix typing * remove unneeded copy * fix one-off warnings * move age group resolution before measure use * adjust how ratio takes out num/denom * remove index alignment * adjust for input draw * adjust plot until for missing data * add scenario cols to comparison (hack) * concat better * remove append source * simplify set logic * add missing index levels * put scenario columns in with a default * put scenario cols in interface * make scenarios more flexible * marginalize within the comparison * put scenarios in comparison plot * add test of new plotutils helper * fix tests for comparison * cleanup * move filter scenarios to method * simplify init * reverse order * add filter_data tests * remove append source * add difference by set * fix typing * add fill with placeholder * add init unit test * add test for raise * add tests for fill with placeholder * fix tests * add calculation tests * add plot utils tests * adjust test comparison * add reorder levels * remove some too-clever helper functions * add yaml types * pin vivarium version * Population Structure (#52) * add input draw to art if needed * partse the measure key better * add broken pop structre * filter baseline * working with several hacks * document hacks * refactor to tuple * split a bit more * make comparisons do split as well * make ratio not need single dataframe * remove some checks * remove unused strict criteria * add check * adjust docstring * remove ratiodata * fix bug * fix data error and add backtest * rename * make sure we have numerator and denominator keys * make a simplification * rearrange * remove check * refactor to dict * pass dict instead * fix typing * utilize non-indentical index * remove unneeded copy * fix plot utils test * move input draw resolution to plot utils * fix warnings * fix broken tests * fix one-off warnings * move age group resolution before measure use * adjust how ratio takes out num/denom * remove index alignment * adjust for input draw * adjust plot until for missing data * add scenario cols to comparison (hack) * concat better * remove append source * simplify set logic * add missing index levels * put scenario columns in with a default * put scenario cols in interface * make scenarios more flexible * marginalize within the comparison * put scenarios in comparison plot * add test of new plotutils helper * fix tests for comparison * cleanup * move filter scenarios to method * simplify init * reverse order * add filter_data tests * remove append source * add difference by set * fix typing * add fill with placeholder * add init unit test * add test for raise * add tests for fill with placeholder * fix tests * add calculation tests * add plot utils tests * adjust test comparison * add reorder levels * remove some too-clever helper functions * add yaml types * pin vivarium version * move draw to constants * pull out input draws and random seed to constants * remove unused import * remove unused copy * remove unused * add ratiomeasure check * type hinting * tests in draft form * add tests * add unit tests for get_measure_from_key * type hint * condense expected dfs * add spaces * fix docstrings * rename * add valid format * formatting * Add unique stratification levels to test fixtures (#55) * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * lint * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * Categorical Relative Risks (#57) * gitignore copilot instructions * add categorical RR measure * add interface function * realtive risk scaffolding * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * add new fixtures and categorical rr test * lint * change measure_key to measure_name and add artifact name * add risk state mapping * modify name * fix title * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * revert change to measure key * minor fixes to allow test pass * generalize RatioMeasure to fix typing * remove default ratiomeasure imp * lint * move args * renames * lint * fix typing * weird merge issue * add default for categories * refactor measures for (hopefully) simplification * Update src/vivarium_testing_utils/automated_validation/data_transformation/measures.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Update src/vivarium_testing_utils/automated_validation/data_transformation/measures.py Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * bugfix --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> * Refactor Module Imports (#58) * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * lint * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * change function import strategy * lint * lint * rename with "test" * add module names * remove accidental copy * revert interface change * fix typo * Allow any artifact key to be loaded into DataLoader (#60) * add tests (woo TDD!) * unrestrict artifact loading * add to gitignore * lint * Add comparison view that aggregates over draws. (#59) * gitignore * change to common stratify index * make some dfs use from_product instead * more products * align indexes but not for pop structure * add unique strat columns to fixtures * cleanup * lint * move title to measures.py (#56) * move title to measures.py * lint * add type * fix test * change function import strategy * lint * lint * rename with "test" * draft * simplify * add test * missed one * Change docstring * rename and adjust interface fn * don't not dropna * make default sort by "" in abstract * do the sory by default thing in interface instead * condense line * Rename dataset to data in places (#61) * add tests (woo TDD!) * unrestrict artifact loading * add to gitignore * lint * rename things from dataset to data in dataloader and interface * adjust regex * pin pandas above 2.0.0 * re-pin pandas after rebase * lint * remove art --------- Co-authored-by: Rajan Mudambi <11376379+rmudambi@users.noreply.github.com> Co-authored-by: Steve Bachmeier <23350991+stevebachmeier@users.noreply.github.com>
Load data from simulation
Description
feature
Changes and notes
Load data from a simulation result directory in a simple way. Add the data to the datasets dictionary if it's not in there already, then return the dataset for a particular key.
I put some test data in with some random values to check against in the future.
Testing