Skip to content

Releases: Bayer-Group/PhenEx

v0.7.6

10 Dec 09:44

Choose a tag to compare

This release fixes build issues with pip in 0.7.5.

v0.7.5

26 Nov 16:18

Choose a tag to compare

MAJOR ADDITIONS

  • Adding a connector to PostgreSQL database that can be used for sample database testing
  • Introducing CHADSVASCPhenotype : Computable database agnostic implementation of CHADS-VASc as defined in Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation, Lip et. al.
  • Introducing TimeShiftPhenotype : add a fixed number of days to the event_date defined by another Phenotype.
  • Introducing TimeRange Phenotypes for HCRU: Adding two new phenotypes that work with tables with a start/end date period. This can be used for health care resource utilization or drug adherence.
    • Introducing TimeRangeCountPhenotype : count the number of distinct time periods. For example, you can count the number of times a person was hospitalized or the number of times a person had continuous medication adherance.
    • Introducing TimeRangeDayCountPhenotype : count the number of days within time periods. For example, you can count the total number of days a person was hospitalized or the number of days a person had continuous medication adherance.
  • Introducing AI Reporting: Add your OpenAI API key and generate a (editable) word or markdown report draft based on the study findings. This combines and concatenates the outputs of other rule based reporters (waterfall, table1). It adds further AI generated contextual information.
  • Introducing one inpatient two outpatient phenotype

MINOR ADDITIONS

  • Updates to Waterfall reporter : fix for when no numeric valued phenotypes present. Previously zero counts were displayed as NaN, now showing 0. Added more quantiles in numeric value summary statistics.
  • Adding tests for return date of logic phenotype : LogicPhenotype can select a single component phenotype as return_date.
  • Adding ValueFiltering to ScorePhenotype : You can now filter a ScorePhenotype using a ValueFilter. For example, calculate a CHADSVASC score using ScorePhenotype and identify the persons with a score greater than 3. includes general fixes for Phenotype execution
  • Updates to Waterfall reporter : improved display of attrition table with color formatting of excel outputs.
  • Added VisitDetail table path for OMOP mappers

BUG FIXES

  • Improvements to LogicPhenotype when component phenotypes return different value types : prior, LogicPhenotype was unable to combine component phenotypes that returned different VALUE data types, for example, when combining a 'SexPhenotype' and an 'AgePhenotype'. Now, LogicPhenotype works as expected with mixed data type VALUE columns, filling the resulting LogicPhenotype VALUE column with nulls.
  • Fix when LogicPhenotype return date is first : prior behavior when logic phenotype's return_date = "first" or "last" and the component phenotypes return_date = 'all' returned the component phenotype with the first/last date, but multiple rows per patient. Now returns a single first/last date.

IMPROVEMENTS

  • **Improvements to lazy execution in Node ** : Node is now context-aware and will trigger recomputation if the context changes. The following logic applies to lazy execution:
    1. look up previous runs of a node by matching name and execution context.
    2. if you find a run with the same name and execution context check the node hash to see if the node itself has changed.
    3. if the node hash has not changed, should_rerun returns False; if node has changed, should_rerun returns True
    4. if you do not find an entry with the same name and execution context, should_rerun returns True

v0.7.4

24 Oct 11:31

Choose a tag to compare

MAJOR ADDITIONS

  • Introducing EventsToTimeRange Derived Tables : EventsToTimeRange is a derived table implementation in PhenEx that converts discrete event dates into continuous time periods. It's particularly valuable for medication adherence and discontinuation studies where you only have prescription dates without explicit durations.
  • R bindings for Phenex : This PR enables R users to call Phenex from within their R environment providing a more seamless experience for R users in hybrid workflows.
  • Introducing Stackable Regimens! : Often we want to see how drugs are utilized; are people taking a single drug, a combination, and if so, which ones? Now you can use StackableRegimens to answer exactly this question. Given a list of input phenotypes, It generates a list of phenotypes computing all possible combinations of those inputs.
  • Diagnostic plots : Added some interactive plots good for sanity checking your analysis and interpreting the results

v0.7.3

10 Oct 13:50

Choose a tag to compare

MAJOR ADDITIONS

  • Introducing Table2 Reporter : You can now get a templated analysis of outcomes defined in your cohort. This Table 2 analysis includes basic reporting on all outcome phenotypes including
    • number of events
    • time under risk
    • incidence rates

MINOR ADDITIONS

  • Introducing mock data : we now can generate mock/fake that resembles the structure of real data. Fake data does not have realistic statistics, only the structure of the data. Having fake data is useful for quick testing, and will be added to our unit / integration tests!
  • Introducing data periods : specify a data period for a cohort. This is the absolute date range in which data is allowed to be accessed by all in/ex criteria, characteristics, and outcomes. This is distinct from the indexing period i.e. absolute date range that the index date can occur; set that using date_range on the entry phenotype

BUG FIXES

  • Fix to restore Derived Tables functionality
    • Implement proper exception handling in multithreaded cohort execution
    • This changes cohort to wait to build the graph until it has access to the tables. The problem is that the cohort doesn't really have full knowledge of what domains it will need at run time because things like autojoin use tables not referenced by any phenotype. Thus, we just have to wait for the user tell us what tables are needed.

IMPROVEMENTS

  • DailyAggregators now return an EVENT_DATE : previously daily value aggregators (DailyMean, DailyMedian, DailyMax, DailyMin) were not returning an event date. Now they return the event_date.
    • If Mean and Median are used, no EVENT_DATE will be returned.
    • For daily aggregators, the daily EVENT_DATES will be returned.
    • For Min and Max, all days that have the Min/Max value are returned, resulting in multiple rows per patient (if min/max values occur on multiple days)
    • Allows the user to clear the cache and force reexecution of a particular node, optionally including all children nodes.
    • Some more cleanup on the Node class, too much repetitive code
    • Better exception handling in multithreading case
    • Remove some duplicated code in Cohort and fix naming of stages
    • Adds last executed to the lazy execution table and provides a method to look it up for the user. Useful for debugging

v0.7.2

11 Sep 08:56

Choose a tag to compare

BUG FIXES

  • The update hash method is not thread safe leading to issues in DAG stage with multithreading. This patch fixes that.

IMPROVEMENTS

  • Even though we are writing tables to disk we are not really getting the benefit because we are still referencing the ibis computation graph. I think this triggers recomputation still on every reference. Instead, point to the table on disk to prevent reexecution when in lazy_execution mode.

DOCUMENTATION UPDATES

  • updated study tutorial

v0.7.0

08 Sep 13:25

Choose a tag to compare

MAJOR ADDITIONS

  • Introducing lazy execution : cohort execution becomes a lot smarter with a 'lazy execution' keyword argument. When set to true, cohorts will intelligently minimize computation; only cohort components who's definition has changed from a previous run are executed. Unmodified components use previously run execution results. This is great when interactively developing a cohort - you can now quickly build up a cohort and see how it affects cohort results while minimizing wait time.

NOT BACKWARDS COMPATIBLE

  • This release is mostly compatible with 0.6.0; however, the call signature to Cohort.execute() has changed slightly.

CHANGES PRIOR BEHAVIOR

MINOR ADDITIONS

    • Implement to_list method on Codelist
  • LogicPhenotype now returns a VALUE corresponding to the returned DATE. : previously, logic phenotype did not return a value. Prior behavior allowed for selection of a date using return_date = last, first or all. Now you can return the value associated with the returned date.

    • BinPhenotype now works with categorical values : prior, BinPhenotype only worked on numeric valued phenotypes. Now, BinPhenotype works on non-numerical VALUE columns as well, allowing value mapping of multiple categorical values to user defined bins.

    • CodelistPhenotype now returns matching codes : prior, CodelistPhenotype only returned person ids and event dates, with all null values. Now it returns a VALUE; this value is the code that resulted in a person fulfilling the phenotype criteria, Thus answering the question 'what code did person x have from codelist y'.

  • Improvements to ibis_connect : Previously SnowflakeConnector required two authentifications. Now requires only one authentication.

BUG FIXES

  • fix to computation graph phenotype : boolean column was not being added for score and arithmetic phenotypes.

v0.6.0

11 Aug 12:40

Choose a tag to compare

MAJOR ADDITIONS

  • Introducing BinPhenotype : BinPhenotype converts numeric values into categorical bin labels. To use, pass it a numeric valued phenotype such as AgePhenotype, MeasurementPhenotype, ArithmeticPhenotype, or ScorePhenotype.

  • Introducing UserDefinedPhenotype!! 🎉🎉 UserDefinedPhenotype allows users of PhenEx to implement custom functionality within a single phenotype. To use, the user must pass a function that returns an ibis table. Fully implemented with tests and documentation.

    UserDefinedPhenotype is especially useful for two use cases:

    1. Hybrid workflows: If you have performed cohort extraction outside of PhenEx (e.g. in R, SQL) but would like to use PhenEx to calculate baseline characteristics and outcomes, we can set the entry criterion to a UserDefinedPhenotype and read a dataframe of PERSON_IDS and INDEX_DATES. In this way, PhenEx flexibly allows us to use multiple tools in our analysis.
    2. Custom event definitions: If you need to define events based on complex logic that is not easily expressed using the built-in PhenEx functionality, you can use UserDefinedPhenotype to implement this logic in a custom function.
  • Introducing EventCountPhenotype!! EventCountPhenotype allows users of PhenEx to

    1. count the number of distinct days on which an event defined by another phenotype occurs
    2. filter by the number of events that occur, allowing detection of e.g. 'at least three instances of AF code within 90 days prior of index date'
    3. filter by the number of days between any pair of events, allowing for detection of e.g. 'two occurrences of AF code separated by more than 90 days'.

    Full implementation, unit tests and documentation added.

  • Added complete implementation of DuckDBConnector class : prior, only the SnowflakeConnector was fully functional. Adding parallel functionality to DuckDBConnector.

  • Introducing DerivedTables and CombineOverlappingPeriods 🎉🍾 : we require ADT feeds (admission discharge transfer). An ADT feed takes overlapping and consecutive visits from the visits_occurrence table and combines them into a single time period with a single start and end date. Added here is a proposal for how derived tables can be implemented in PhenEx, as well as an initial, highly imperfect implementation of combining overlapping periods.

    • DerivedTables are any table that are generated from the source data and do not require patient level specification (i.e. they are not phenotypes, as phenotypes subset the data using patient level criteria). Here are ADT feeds, but one can imagine data cleaning steps implemented in this manner. These derived tables are defined by the user in a manner similar to phenotypes; user specifies the source table domain key. Different is that the user defines the output destination table domain key. Derived tables are then generated during cohort execution and appended to the subset_tables_entry, and thus also present in the subset_tables_index, for use by all all phenotypes, except the entry criterion, accessible by the output destination domain key.
    • CombineOverlappingPeriods is our first DerivedTable that contains a non-performant implementation as a placeholder until a more performant implementation is written. It uses pandas rather than ibis and thus will have performance issues with large cohorts. It has been executed on cohorts up to 300k patients without problems.

NOT BACKWARDS COMPATIBLE

  • Change EventCountPhenotype keyword argument return_event to component_date_select : prior, EventCountPhenotype keyword for selection of date of first or second event was called 'return_event'. Now, in order to harmonize interface, it is changed to 'component_date_select', which is the term used for MeasurementChangePhenotype.
  • Updated interface for CategoricalPhenotype : Prior, CategoricalPhenotype duplicated keyword parameters of CategoricalFilter i.e. column_name and allowed_values. Now CategoricalPhenotype takes directly a CategoricalFilter as a keyword argument categorical_filter. This harmonizes interface with TimeRangePhenotype, AgePhenotype i.e. we always pass filters and do not duplicate filter keyword arguments. This also adds new functionality, allowing CategoricalPhenotype to operate on multiple columns by taking advantage of the logical operations provided by CategoricalFilter. Updated tests and added tests for time range filtering.
  • Update ContinuousCoveragePhenotype : renamed to TimeRangePhenotype!! : we have updated our interface guidelines; we no longer duplicate keyword arguments within phenotypes if they are passed by filters.
    • ContinuousCoverPhenotype previously had a duplicate implementation of relative time range filtering. We now pass a RelativeTimeRangeFilter directly using the keyword argument relative_time_range
    • ContinuousCoveragePhenotype has been updated to work with any table with a start_date and an end_date (either directly or provided by mappers). This allows for usage with the CombineOverlappingPeriods derived table. It has been renamed to TimeRangePhenotype to reflect that this.

CHANGES PRIOR BEHAVIOR

  • Added new keyword parameter allow_null_end_date to TimeRangePhenotype : TimeRangePhenotype currently requires that the event date of interest (usually index date) is within the start_date and end_date. As this is often used for identify patients with continuous insurance coverage, we found that often patients that continue be enrolled in the data source / continue to have coverage have a null end_date. We now allow the end_date to be null; in fact, the default value of allow_null_end_date is set to True. This may change previous executions of studies.
  • Update all naming of tables and columns to uppercase : ibis allows lowercase in table names and column names. This is messy because snowpark and R don't allow this (and SQL doesn't care). We now require all table and column names to be capitalized to improve compatibility with downstream and other tool usage. This implementation ensures uppercase table names and column names by :
    1. all phenotype names are now uppercase, meaning that all column names using phenotype names will also be capitalized.
    2. all tables written by the cohort will be upper case, as the table create enforces uppercase when writing

MINOR ADDITIONS

  • Add pretty display to Waterfall Reporter : Waterfall Reporter previously outputs a pandas dataframes with all numeric data types. This made displays of the waterfall table rather unpleasing with NaNs displayed where values were not applicable (i.e. all summary statistics for a binary variable). Now have added
    • added pretty_display keyword argument, set to true by default, which casts the table to strings and fills nulls with empty strings.
    • a percentage column, showing how many patients remain after application
  • Updated TimeToEvent plotting : previously created one plot with all outcomes. Now added class methods to create :
    1. plot_multiple_kaplan_meier : a figure with multiple KM curves for selected outcomes. No risk counts displayed.
    2. plot_single_kaplan_meier : a figure with a single selected outcome with risk counts.
      Additionally, for both methods, can write figures to disk by passing the path_dir keyword argument.

BUG FIXES

  • Fixed reporting of BinPhenotype, ScorePhenotype and EventCountPhenotype in Table1 Reporter :

    • BinPhenotype is reported as a categorical value, so each bin is displayed in table1 automatically with count of patients in that bin
    • ScorePhenotype is now reported as a categorical value, so each score and count of patients with that score are automatically added to Table1
    • EventCountPhenotype is now reported as a numerical value, so the summary statistics are displayed automatically in Table1
  • Fixed strange behavior in Table1 Reporter : table 1 had strange behavior, placing counts with wrong phenotypes; the counts were correct for a phenotype, but assigned to the incorrect label. The implementation of Table1 Reporter was changed from using the baseline characteristics table to using the phenotype tables themselves. This solves the issue. Additional changes to logging to prevent multiple log statements.

  • Fixed to OMOPObservationTable mapper : previously had the incorrect OMOP column name for the 'code' key defined. Now corrected the 'code' key to OBSERVATION_CONCEPT_ID. This allows CodelistPhenotype to correctly work on the OBSERVATION_CONCEPT_ID column to filter for encounter types to the ObservationTable.

  • Fixed Waterfall 'waterfall' column bug : There was an issue where the 'remaining' column could increase because it was not counting distinct patient ids. If an inclusion/exclusion phenotype was returning non-distinct patient ids, it could appear that the waterfall count was increasing, when this was by definition impossible. Now displaying distinct patient ids in the waterfall column, thus displaying the correct count of patients remaining in the cohort at each row in the attrition table.

  • Fixed to TimeToEvent Reporter: TimeToEventReporter was incorrectly identifying patients who had an event due to a bug in the selection of the 'first event date'. This was due to ibis.least returning null if any column was null. New implementation fixes this, correctly identifying the first event date.

    Fixed to Table1 Reporter: Added ScorePhenotype and Arithmetic Phenotype to Table1 Reporters' Value reporting. This allows Table1 to display descriptive statistics for the value column of ScorePhenotype and ArithmeticPhenotype.

  • Fixed Waterfall reporter count of N : Previously, waterfall reporter was counting number of events and not number of distinct patient ids. N column fixed to display number of unique patient ids with a given inclusion/exclusion criteria on that ro...

Read more

v0.5.0

16 Jun 09:42

Choose a tag to compare

  • Added time to event reporter (KM curves)
  • Added ISTH Major Bleeding phenotype
  • Bug fixes to MeasurementChangePhenotype
  • Fixes to Table1 (ensure reporting of categorical phenotypes)
  • Updates to documentation and roadmap

v0.4.3

02 Apr 06:46

Choose a tag to compare

Updates:

  • Categorical Filter : Added isnull, notnull, isin, notin operators.
  • Phenotypes : Added description to phenotypes

Fixes:

  • Table One : Corrected table display issues.
  • Ibis Null Handling : Fixed date/datetime null issues.
  • General Code Cleanup : Formatted with black, removed unused print statements.

v0.4.2

03 Mar 15:44

Choose a tag to compare

  • Adds serialization support for Phenotype and Cohort