Skip to content

Commit 7a7f4b3

Browse files
sueoglueroell
andauthored
Fix censor_flg encoding in mimic_2() (#227)
* fixed mimic_2 dataset loader so that censor_flg is encoded correctly * handle the case if censor_flg passed in the columns_obs_only argument --------- Co-authored-by: Eljas Roellin <65244425+eroell@users.noreply.github.com>
1 parent e4d3cf4 commit 7a7f4b3

1 file changed

Lines changed: 9 additions & 0 deletions

File tree

src/ehrdata/dt/datasets.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -917,6 +917,15 @@ def mimic_2(
917917
columns_obs_only=columns_obs_only,
918918
)
919919

920+
# In the raw dataset, the variable censor_flg is encoded inversely (0=death, 1=censored)
921+
# We flip it here so it follows the standard convention (0=censored, 1=event happened)
922+
923+
censor_col = "censor_flg"
924+
if censor_col in edata.var.index:
925+
edata[:, [censor_col]].X = np.where(edata[:, [censor_col]].X == 0, 1, 0)
926+
elif censor_col in edata.obs.columns:
927+
edata.obs[censor_col] = np.where(edata.obs[censor_col] == 0, 1, 0)
928+
920929
return edata
921930

922931

0 commit comments

Comments
 (0)