All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Compatibility with
anndata>=0.12.13(#240) @eroell
- Assigning
.Xto a view of an X-less {class}~ehrdata.EHRData(e.g. one created withlayers=only) no longer raisesTypeError: 'NoneType' object does not support item assignment. The view is now materialised before the assignment, consistent with how AnnData handles other field modifications on views. (#233) @eroell
- {func}
~ehrdata.infer_feature_typesconsiders integers from 0, ..., n as numeric. It further provides a new argumentbinary_as, to steer if columns 0/1 should be considered numeric or categorical. (#231) @eroell
- {func}
~ehrdata.io.from_pandaswithformat='long'provides a new keyword argumentfill_time_gapsthat fills missing timegaps in the common case of integer time steps from 0 to n_timesteps (#229) @eroell
- {func}
~ehrdata.dt.mimic_2columncensor_flgswitched to lifeline's convention with 1=event, 0=censored, before this dataset loader function had them vice versa since the dataset provides them as such originally. (#227) @sueoglu
- {func}
~ehrdata.io.from_pandaswithformat='long'misordered entries in.X/.layerswith.obsif the input df was not sorted for the obs id keys, which is now fixed. (#228) @eroell
- Documentation style polishing (#223) @zethson
- {func}
~ehrdata.io.omop.setup_connectioncan read.parquetfiles. (#217) @eroell
- Sliceing of
EHRDataobjects fixed when the backing object is anAnnData. (#218) @eroell
- More concise messages in {func}
~ehrdata.infer_feature_types. (#215) @zethson
- {func}
~ehrdata.move_to_obsand {func}~ehrdata.move_to_xare new helpers for conveniently moving variables from central 2D arrays to the.obsfield, and vice versa. (#199) @eroell - {func}
~ehrdata.dt.physionet2019as another out-of-the-box, conveniently available dataset with 40'000 ICU stays from the Physionet 2019 challenge. (#204) @eroell time_precisionparameter ("date"or"datetime") to {func}~ehrdata.io.omop.setup_variablesand {func}~ehrdata.io.omop.setup_interval_variablesfor finer temporal granularity control. (#210) @eroell
- {func}
~ehrdata.io.read_h5adfixed issues whenbacked=True. (#199) @eroell - {func}
~ehrdata.io.read_h5adfixed bug when.XisNoneandharmonize_missing_featuresisTrue. (#206) @eroell - {func}
~ehrdata.io.omop.setup_obswithobservation_table="person_visit_occurrence"now supports multiple visits per patient, creating one row per visit with unique observation IDs, instead of failing with xarray conversion errors with non-unique indices. (#210) @eroell - OMOP time interval boundaries now use half-open intervals
[start, end)to prevent duplicate measurements at interval boundaries. (#210) @eroell
- Support Python3.14 (#194) @Zethson
- Address
FutureWarnings across multiple places (#200) @eroell - Enhanced tutorial structure (#208) @eroell
- Dataset generator function
ed.dt.ehrdata_blobsnow takesn_cat_varandn_categoriesarguments to generate categorical (integer encoded) time series data (#207) @sueoglu - If
enrich_var_with_feature_info=Truein {func}~ehrdata.io.omop.setup_variablesand {func}~ehrdata.io.omop.setup_interval_variables,data_table_concept_idsnot included within the concept table are now mapped from their respective alternateconcept_idincluded in the concept_relationship table to retrieve the available feature information. (#205) @KilianDahm - {func}
~ehrdata.io.omop.setup_variablesand {func}~ehrdata.io.omop.setup_interval_variableswith use of"person"now checksbirth_datetimefor meaningful behaviour and error messages. (#210) @eroell - {func}
~ehrdata.integrations.vitessce.gen_default_configprovides convenience to generate a config directly from anEHRDataobject, and should be used instead of the previousehrdata.integrations.vitessce.gen_config. (#211) @eroell
{class}~ehrdata.EHRData drops the .R field, and now supports 3D data storage in any slot of .layers. See the {doc}tutorials/getting_started tutorial for an introduction to this behaviour. In the future, .X will be enabled soon for 3D data storage as well.
EHRDatadrops the.Rfield in favor of using.layersfor any 3D data arrays (#184) @eroellEHRData's shape property will always return a 3 dimensional shape. If anEHRDataobject has flat arrays only, the third dimension will be 1. (#184) @eroell- The following functions now take a
layerargument: {func}~ehrdata.io.read_csv, {func}~ehrdata.io.from_pandas, {func}~ehrdata.io.to_pandas, {func}~ehrdata.io.omop.setup_variables, {func}~ehrdata.io.omop.setup_interval_variables, {func}~ehrdata.dt.ehrdata_blobs, {func}~ehrdata.dt.physionet2012. If it is let to its default,None, the.Xfield ofEHRDatais used. Since.Xis 2D in this release, in cases with 3D data, thelayerargument needs to be used. (#184) @eroell - {func}
~ehrdata.io.write_zarrnow writes anEHRDataspecific store encoding, withanndataas a substore. This change allows to useAnnDatawith its change to consolidated Zarr metadata, and better isolatesAnnData's io. (#185) @eroell - {func}
~ehrdata.io.read_zarris adapted to read the new store encoding, and can also deal withAnnDatastores. (#185) @eroell
- Use custom logger & remove pydata sparse (#176) @Zethson
- Replace figshare with scverse S3 (#177) @Zethson
- Update template to v0.6.0 (#166) @Zethson
- Fix order of
varcreated ined.io.omop.setup_variablesanded.io.omop.setup_interval_variables(#179) @eroell
- Rename
ed.pl.vitessce.gen_configtoed.integrations.vitessce.gen_config(#181) @eroell - Rename
ed.tl.omop.EHRDatasettoed.integrations.torch.OMOPEHRDataset(#181) @eroell
- Update duckdb imports for future (#157) @eroell
- Fix tests and Getting Started Notebook (#155) @eroell
- Update duckdb imports for future (#155) @eroell
- Cleaned up and updated tutorial notebooks (#140) @agerardy
- {func}
~ehrdata.io.read_csvReads a csv file (#136) @eroell - {func}
~ehrdata.io.read_h5adReads an h5ad file (#136) @eroell - {func}
~ehrdata.io.read_zarrReads a zarr file (#136) @eroell - {func}
~ehrdata.io.write_h5adWrites an h5ad file (#136) @eroell - {func}
~ehrdata.io.write_zarrWrites a zarr file (#136) @eroell - {func}
~ehrdata.io.from_pandasTransform a given {class}~pandas.DataFrameinto an {class}~ehrdata.EHRDataobject (#136) @eroell - {func}
~ehrdata.io.to_pandasTransform an {class}~ehrdata.EHRDataobject into a {class}~pandas.DataFrame(#136) @eroell - {func}
~ehrdata.dt.mimic_2Loads the MIMIC-II dataset (#136) @eroell - {func}
~ehrdata.dt.mimic_2_preprocessedLoads the preprocessed MIMIC-II dataset (#136) @eroell - {func}
~ehrdata.dt.diabetes_130_rawLoads the raw diabetes-130 dataset (#136) @eroell - {func}
~ehrdata.dt.diabetes_130_fairlearnLoads the preprocessed diabetes-130 dataset by fairlearn (#136) @eroell - {func}
~ehrdata.infer_feature_typesInfer feature types in an {class}~ehrdata.EHRDataobject (#136) @eroell - {func}
~ehrdata.feature_type_overviewOverview of inferred feature types (#136) @eroell - {func}
~ehrdata.replace_feature_typesReplacing inferred feature types (#136) @eroell - {func}
~ehrdata.harmonize_missing_valuesHarmonize missing values in an {class}~ehrdata.EHRDataobject (#136) @eroell
- Initialize EHRData with X and layers (#132) @eroell
- Rename
.tattribute to.tem
- Zarr version to less than 3
- Added missing zarr dependency
- Expanded documentation
- Improved OMOP Extraction
- Support for COO sparse matrices for R
- A
ed.dt.ehrdata_blobstest data generator function - Replace -1 encoded missing values with nans in physionet2012 challenge data
- Renamed
rtoR
- Initial release
- Basic tool, preprocessing and plotting functions
- tutorial notebooks updated to align with breaking changes