Skip to content

How to handle non diseases in EFO? #50

Description

@dhimmel

explodes #35 (comment)

We currently ignore non-diseases, by only training and predicting on terms that are diseases as per get_disease_nodes.

Our training labels only apply to diseases. Therefore, I think it makes sense to continue training only on diseases. However, there is the possibility that we could:

  1. create an is_disease marker column that is part of the output
  2. compute features for non-diseases
  3. compute predictions for non-diseases

While predictions on non-diseases would likely be of lower quality due to the lack of training coverage, many of the same concepts of grouping terms versus more specific terms would still apply. Users could then decide to discard all predictions when is_disease is False to continue with the current behavior.

There could be a benefit to having precision predictions for non-diseases. For example, classifications of symptoms (one example being pain) would make sense along a precision axis:

image

@yonromai: I'll bring this up with the data team at our next meeting, so no need to do anything until then. CC @eric-czech

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions