Skip to content

Commit ea4dc95

Browse files
eroellZethson
andauthored
ehrapy transition to ehrdata: take over ehrapys data functionalities (#136)
* add basic dataframe support and test * more tests, modified to_dataframe * read_csv * save commit * more on csv, h5ad * save commit * updates, expanded from_pandas * expand to_pandas * type inference and missing value as strings * iter commit, tests fail * fix to_pandas and feature types, add tests * add tests and doc for four new datasets * save commit * basic draft for read_zarr * h5ad, zarr io draft * harmonize to filename as anndata * remove commented out code * changelog, refactoring * unreleased statement in changelog * fix refactoring * include feature type functions in doc: * be more clear about file formats in download internally * Apply suggestions from code review Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net> * more intersphinx, more pandas examples, fixes * iter commit * address more review comments: * sparse io test files * cleaning round, reduce redundancy * add missing h5ad, argument cleaning * pandas description in tutorial * remove backed comment zarr * more explicit arguments, cleanup * test nonnumeric layer * utilzz refactor * fix doc? fix typos * refactor .tl, .io, fix typos * fix display of first level functions? * Apply suggestions from code review Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net> * address comments * split lines in csv * split lins for more docstrings in io * hdf5, fastarrayutils dependency * csv read description like pandas * more io like pandas wording --------- Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
1 parent c5ac12e commit ea4dc95

222 files changed

Lines changed: 4461 additions & 150 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ __pycache__/
1616
/node_modules/
1717

1818
# docs
19+
/docs/tutorials/data/
1920
/docs/generated/
2021
/docs/_build/
2122
/docs/generated/

CHANGELOG.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,36 @@ and this project adheres to [Semantic Versioning][].
88
[keep a changelog]: https://keepachangelog.com/en/1.0.0/
99
[semantic versioning]: https://semver.org/spec/v2.0.0.html
1010

11+
## [0.0.6] Not yet released
12+
13+
### Fixed
14+
- Cleaned up and updated tutorial notebooks ([#140](https://github.com/theislab/ehrdata/pull/140)) @agerardy
15+
16+
### Added
17+
- {func}`~ehrdata.io.read_csv` Reads a csv file ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
18+
- {func}`~ehrdata.io.read_h5ad` Reads an h5ad file ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
19+
- {func}`~ehrdata.io.read_zarr` Reads a zarr file ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
20+
- {func}`~ehrdata.io.write_h5ad` Writes an h5ad file ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
21+
- {func}`~ehrdata.io.write_zarr` Writes a zarr file ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
22+
- {func}`~ehrdata.io.from_pandas` Transform a given {class}`~pandas.DataFrame` into an {class}`~ehrdata.EHRData` object ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
23+
- {func}`~ehrdata.io.to_pandas` Transform an {class}`~ehrdata.EHRData` object into a {class}`~pandas.DataFrame` ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
24+
- {func}`~ehrdata.dt.mimic_2` Loads the MIMIC-II dataset ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
25+
- {func}`~ehrdata.dt.mimic_2_preprocessed` Loads the preprocessed MIMIC-II dataset ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
26+
- {func}`~ehrdata.dt.diabetes_130_raw` Loads the raw diabetes-130 dataset ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
27+
- {func}`~ehrdata.dt.diabetes_130_fairlearn` Loads the preprocessed diabetes-130 dataset by fairlearn ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
28+
- {func}`~ehrdata.infer_feature_types` Infer feature types in an {class}`~ehrdata.EHRData` object ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
29+
- {func}`~ehrdata.feature_type_overview` Overview of inferred feature types ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
30+
- {func}`~ehrdata.replace_feature_types` Replacing inferred feature types ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
31+
- {func}`~ehrdata.harmonize_missing_values` Harmonize missing values in an {class}`~ehrdata.EHRData` object ([#136](https://github.com/theislab/ehrdata/pull/136)) @eroell
32+
33+
34+
### Modified
35+
1136
## [0.0.5]
1237

1338
### Fixed
1439

15-
- Initialize EHRData with X and layers
40+
- Initialize EHRData with X and layers ([#132](https://github.com/theislab/ehrdata/pull/132)) @eroell
1641

1742
### Added
1843

docs/api/datasets_index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,8 @@
1515
dt.gibleed_omop
1616
dt.synthea27nj_omop
1717
dt.physionet2012
18+
dt.mimic_2
19+
dt.mimic_2_preprocessed
20+
dt.diabetes_130_raw
21+
dt.diabetes_130_fairlearn
1822
```

docs/api/ehrdata_index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,9 @@
1212
1313
EHRData
1414
15+
infer_feature_types
16+
feature_type_overview
17+
replace_feature_types
18+
harmonize_missing_values
19+
1520
```

docs/api/io_index.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,13 @@
1010
:toctree: io
1111
:nosignatures:
1212
13+
io.read_csv
14+
io.read_h5ad
15+
io.read_zarr
16+
io.write_h5ad
17+
io.write_zarr
18+
io.from_pandas
19+
io.to_pandas
1320
io.omop.setup_connection
1421
io.omop.setup_obs
1522
io.omop.setup_variables

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,5 +148,6 @@
148148
# Redirect broken parameter annotation classes
149149
qualname_overrides = {
150150
"zarr._storage.store.Store": "zarr.storage.MemoryStore",
151+
"zarr.core.group.Group": "zarr.group.Group",
151152
"lnschema_core.models.Artifact": "lamindb.Artifact",
152153
}

docs/references.bib

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,19 @@
1+
@article{strack2014impact,
2+
title={Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records},
3+
author={Strack, Beata and DeShazo, Jonathan P and Gennings, Chris and Olmo, Juan L and Ventura, Sebastian and Cios, Krzysztof J and Clore, John N},
4+
journal={BioMed research international},
5+
volume={2014},
6+
number={1},
7+
pages={781670},
8+
year={2014},
9+
publisher={Wiley Online Library}
10+
}
11+
@article{bird2020fairlearn,
12+
title={Fairlearn: A toolkit for assessing and improving fairness in AI},
13+
author={Bird, Sarah and Dud{\'\i}k, Miro and Edgar, Richard and Horn, Brandon and Lutz, Roman and Milan, Vanessa and Sameki, Mehrnoosh and Wallach, Hanna and Walker, Kathleen},
14+
journal={Microsoft, Tech. Rep. MSR-TR-2020-32},
15+
year={2020}
16+
}
117
@article{Virshup_2023,
218
doi = {10.1038/s41587-023-01733-8},
319
url = {https://doi.org/10.1038%2Fs41587-023-01733-8},
@@ -8,3 +24,31 @@ @article{Virshup_2023
824
title = {The scverse project provides a computational ecosystem for single-cell omics data analysis},
925
journal = {Nature Biotechnology}
1026
}
27+
@book{critical2016secondary,
28+
title={Secondary analysis of electronic health records},
29+
author={Critical Data, MIT},
30+
year={2016},
31+
publisher={Springer Nature}
32+
}
33+
@article{du2023saits,
34+
title={Saits: Self-attention-based imputation for time series},
35+
author={Du, Wenjie and C{\^o}t{\'e}, David and Liu, Yan},
36+
journal={Expert Systems with Applications},
37+
volume={219},
38+
pages={119619},
39+
year={2023},
40+
publisher={Elsevier}
41+
}
42+
@article{du2023pypots,
43+
title={PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series},
44+
author={Du, Wenjie},
45+
journal={arXiv preprint arXiv:2305.18811},
46+
year={2023}
47+
}
48+
@article{kallfelz2021mimic,
49+
title={MIMIC-IV demo data in the OMOP Common Data Model},
50+
author={Kallfelz, Michael and Tsvetkova, Anna and Pollard, Tom and Kwong, Manlik and Lipori, Gigi and Huser, Vojtech and Osborn, Jeffrey and Hao, Sicheng and Williams, Andrew},
51+
journal={object Object]. doi},
52+
volume={10},
53+
year={2021}
54+
}

0 commit comments

Comments
 (0)