full scale test missingness#501
Conversation
There was a problem hiding this comment.
From https://mypy.readthedocs.io/en/stable/more_types.html#function-overloading:
The default values of a function’s arguments don’t affect its signature – only the absence or presence of a default value does. So in order to reduce redundancy, it’s possible to replace default values in overload definitions with ... as a placeholder
There was a problem hiding this comment.
Good to know. I guess we should replace the defaults in all of our overloads with ... then
| kwargs["state"] = state | ||
| unnoised_data = dataset_func(**kwargs) | ||
|
|
||
| # We must manually clean the data for noising since we are recreating our main noising loop |
There was a problem hiding this comment.
Longer explanation in one comment at top
| config = get_configuration() | ||
|
|
||
| # NOTE: This is recreating Dataset._noise_dataset but adding assertions for missingness | ||
| for noise_type in NOISE_TYPES: |
There was a problem hiding this comment.
Do we need to test missingness for ALL noise types?
There was a problem hiding this comment.
Will be resolved by putting missingness test in refactor loop
| if isinstance(noise_type, RowNoiseType): | ||
| if config.has_noise_type(dataset.dataset_schema.name, noise_type.name): | ||
| noise_type(dataset, config) | ||
| # Check missingness is synced with data |
There was a problem hiding this comment.
"Check that dataset.missingness was updated correctly by noising function to match noised data"
| # Get dataframe for each dependent group to merge with guardians | ||
| in_households_under_18 = dataset.data.loc[ | ||
| (dataset.data["age"] < 18) | ||
| (dataset.data["age"].astype(int) < 18) |
There was a problem hiding this comment.
So cols are all strings, right? How was this ever working?
There was a problem hiding this comment.
This column dtype is an int when it's first read in and during noising during data generation, but a str/object in our tests which noise post-processed unnoised data.
Category: feature JIRA issue: MIC-5515 Add full scale test_dataset_missingness. Typing. Add default values to overload functions for generating data with dask. Testing Ran tests on acs, cps, and wic for RI and USA.
Category: feature JIRA issue: MIC-5515 Add full scale test_dataset_missingness. Typing. Add default values to overload functions for generating data with dask. Testing Ran tests on acs, cps, and wic for RI and USA.
Category: feature JIRA issue: MIC-5515 Add full scale test_dataset_missingness. Typing. Add default values to overload functions for generating data with dask. Testing Ran tests on acs, cps, and wic for RI and USA.
full scale test missingness
Description
Add full scale test_dataset_missingness.
Typing.
Add default values to overload functions for generating data with dask.
Testing
Ran tests on acs, cps, and wic for RI and USA.