How to handle non diseases in EFO?

explodes https://github.com/related-sciences/nxontology-ml/issues/35#issuecomment-1766560619

We currently ignore non-diseases, by only training and predicting on terms that are diseases as per [`get_disease_nodes`](https://github.com/related-sciences/nxontology-ml/blob/4fe78192209277e56acb34ffd2ec8fccee0ce77c/nxontology_ml/model/predict.py#L23-L36).

Our training labels only apply to diseases. Therefore, I think it makes sense to continue training only on diseases. However, there is the possibility that we could:

1. create an `is_disease` marker column that is part of the output
2. compute features for non-diseases
3. compute predictions for non-diseases

While predictions on non-diseases would likely be of lower quality due to the lack of training coverage, many of the same concepts of grouping terms versus more specific terms would still apply. Users could then decide to discard all predictions when `is_disease` is False to continue with the current behavior.

There could be a benefit to having precision predictions for non-diseases. For example, classifications of symptoms (one example being pain) would make sense along a precision axis:

![image](https://github.com/related-sciences/nxontology-ml/assets/1117703/166d7209-1838-46d9-8cec-40ef03304222)

@yonromai: I'll bring this up with the data team at our next meeting, so no need to do anything until then. CC @eric-czech 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle non diseases in EFO? #50

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to handle non diseases in EFO? #50

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions