-
Notifications
You must be signed in to change notification settings - Fork 9
[GEN-1892] redeact pediatric YEAR_CONTACT and YEAR_DEATH #611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -149,6 +149,22 @@ def _redact_year(df_col): | |
| return df_col | ||
|
|
||
|
|
||
| def _redact_ped_year(df_col): | ||
| """Redacts year values that have < | ||
|
|
||
| Args: | ||
| df_col: Dataframe column/pandas.Series of a year column | ||
|
|
||
| Returns: | ||
| pandas.Series: Redacted series | ||
|
|
||
| """ | ||
| year = df_col.astype(str) | ||
| contain_lessthan = year.str.contains("<", na=False) | ||
| df_col[contain_lessthan] = "withheld" | ||
| return df_col | ||
|
|
||
|
|
||
| # TODO: Add to transform.py | ||
| def _to_redact_difference(df_col_year1, df_col_year2): | ||
| """Determine if difference between year2 and year1 is > 89 | ||
|
|
@@ -205,6 +221,10 @@ def redact_phi( | |
| ) | ||
| clinicaldf.loc[to_redact, "BIRTH_YEAR"] = "cannotReleaseHIPAA" | ||
|
|
||
| # redact range year for pediatric data | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might have missed this in the discussion but to confirm,
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Xindi referred to the AC of the ticket:'Keep standardizing the center uploaded masked data as "withheld"' and confirmed that we need to redact these two columns. I believe these are phi but the reason to redact should be more related to standardization.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I see. Sounds good as long as that's clearly documented, see JIRA comment |
||
| clinicaldf["YEAR_CONTACT"] = _redact_ped_year(clinicaldf["YEAR_CONTACT"]) | ||
| clinicaldf["YEAR_DEATH"] = _redact_ped_year(clinicaldf["YEAR_DEATH"]) | ||
|
|
||
| return clinicaldf | ||
|
|
||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.