Hjafari/feature/mic 5517 guardian test by hussain-jafari · Pull Request #498 · ihmeuw/pseudopeople

hussain-jafari · 2025-03-17T21:17:25Z

full scale guardian duplication test

Description

Category: test
JIRA issue: MIC-5517

Expand duplicate guardian test to full scale.

Testing

Ran test on census on RI and US data.

…guardian_test

hussain-jafari · 2025-03-17T21:18:23Z

src/pseudopeople/dataset.py


        self.data = data

+    def keep_schema_columns(self, data, dataset_schema) -> pd.DataFrame:


Rename and change to static method

hussain-jafari · 2025-03-17T21:20:18Z

tests/integration/release/test_release.py

+) -> None:
+    if dataset_name != DatasetNames.CENSUS:
+        return
+


Add comments about why you're patching

hussain-jafari · 2025-03-17T21:22:15Z

tests/integration/release/test_release.py

+        "pseudopeople.dataset.Dataset.keep_schema_columns", side_effect=lambda df, _: df
+    )
+    mocker.patch(
+        "pseudopeople.configuration.generator.validate_overrides",


Check whether these validations are being tested

Why do you need to patch over these?

Does validate_overrides not allow for 0 probabilities?

For age differences, by default we can't have a non-zero probability of keeping the ages the same.

I'd say you should set that noise type to not noise then

hussain-jafari · 2025-03-17T21:25:24Z

src/pseudopeople/noise_functions.py

 #     return dataset


+# Helper function to format group dataframe and merging with their dependents


This whole function was copied without changes

stevebachmeier

Looks good! I'd like clarification on a few things before I approve

src/pseudopeople/dataset.py

stevebachmeier · 2025-03-19T15:26:35Z

tests/utilities.py

                        new_probability = [0.0 for x in probability]
                    elif isinstance(probability, dict):
-                        new_probability = {key: 0.0 for key in probability.keys()}
+                        # NOTE: this will fail default config validations


Can you explain this? Why would the key be an integer? I don't really undwerstand your note, either.

This structure is for "possible age differences" where the keys can be -2 or 1 to indicate what int to add to a simulant's actual age and the value is the probability of picking each age difference.

stevebachmeier · 2025-03-19T15:27:51Z

tests/integration/release/test_release.py

+        "pseudopeople.dataset.Dataset.keep_schema_columns", side_effect=lambda df, _: df
+    )
+    mocker.patch(
+        "pseudopeople.configuration.generator.validate_overrides",


Does validate_overrides not allow for 0 probabilities?

stevebachmeier · 2025-03-19T15:34:16Z

tests/integration/release/test_release.py

+        group_data = unnoised.loc[
+            (unnoised["age"].astype(int) < age)
+            & (unnoised["housing_type"] == housing_type)
+            & (unnoised["guardian_1"].notna())


because guardian_1 is never nan, right? But guardian_2 might be?

Yes if "guardian_1" is notna then there is at least one guardian but not necessarily guardian 2

stevebachmeier · 2025-03-19T15:43:03Z

src/pseudopeople/noise_functions.py

            if index_to_copy.empty:
                continue
            noised_group_df = group_df.loc[index_to_copy]
+            noised_group_df["old_housing_type"] = noised_group_df["housing_type"]


I don't love adding this column in the "real" data when it's only used for testing. But it also seems too big of a pain to refactor this somehow so that you can make a test fixture of the call and add the col there.

Instead, couldn't you get the "old_housing_type" from the unnoised_data in the tests?

+1 if this is just for testing you could save a copy of the old_housing_type series and then map that to verify your test or something.

…guardian_test

Category: test JIRA issue: MIC-5517 Expand duplicate guardian test to full scale. Testing Ran test on census on RI and US data.

Hussain Jafari added 10 commits March 10, 2025 09:13

intermediate push

ccd619b

Merge branch 'epic/full_scale_testing' into hjafari/feature/MIC-5517_…

e0f4307

…guardian_test

save file changes before committing

ced0250

intermediate testing push

a1cfc65

working push

a806f2c

minor cleanup

3727a55

more cleanup

b15ee9d

lint

670d5f7

type

f174ab7

lint again

01ecac7

hussain-jafari requested review from albrja, patricktnast, rmudambi and stevebachmeier as code owners March 17, 2025 21:17

hussain-jafari commented Mar 17, 2025

View reviewed changes

PR feedback

5023d98

stevebachmeier reviewed Mar 19, 2025

View reviewed changes

hussain-jafari and others added 3 commits March 20, 2025 12:24

Trigger Build

2d7607c

Merge branch 'epic/full_scale_testing' into hjafari/feature/MIC-5517_…

3e04940

…guardian_test

PR feedback

0745388

albrja approved these changes Mar 21, 2025

View reviewed changes

stevebachmeier approved these changes Mar 25, 2025

View reviewed changes

hussain-jafari merged commit d3faa25 into epic/full_scale_testing Mar 25, 2025
8 checks passed

hussain-jafari deleted the hjafari/feature/MIC-5517_guardian_test branch March 25, 2025 16:59

hussain-jafari added a commit that referenced this pull request May 7, 2025

Hjafari/feature/mic 5517 guardian test (#498)

de0405e

Category: test JIRA issue: MIC-5517 Expand duplicate guardian test to full scale. Testing Ran test on census on RI and US data.

hussain-jafari added a commit that referenced this pull request May 7, 2025

Hjafari/feature/mic 5517 guardian test (#498)

3698633

Category: test JIRA issue: MIC-5517 Expand duplicate guardian test to full scale. Testing Ran test on census on RI and US data.

hussain-jafari added a commit that referenced this pull request Jul 24, 2025

Hjafari/feature/mic 5517 guardian test (#498)

6c17636

Category: test JIRA issue: MIC-5517 Expand duplicate guardian test to full scale. Testing Ran test on census on RI and US data.


		self.data = data

		def keep_schema_columns(self, data, dataset_schema) -> pd.DataFrame:

		# return dataset


		# Helper function to format group dataframe and merging with their dependents

Comments

Conversation

hussain-jafari commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

full scale guardian duplication test

Description

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevebachmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hussain-jafari commented Mar 17, 2025 •

edited

Loading