explodes #35 (comment)
We currently ignore non-diseases, by only training and predicting on terms that are diseases as per get_disease_nodes.
Our training labels only apply to diseases. Therefore, I think it makes sense to continue training only on diseases. However, there is the possibility that we could:
- create an
is_disease marker column that is part of the output
- compute features for non-diseases
- compute predictions for non-diseases
While predictions on non-diseases would likely be of lower quality due to the lack of training coverage, many of the same concepts of grouping terms versus more specific terms would still apply. Users could then decide to discard all predictions when is_disease is False to continue with the current behavior.
There could be a benefit to having precision predictions for non-diseases. For example, classifications of symptoms (one example being pain) would make sense along a precision axis:

@yonromai: I'll bring this up with the data team at our next meeting, so no need to do anything until then. CC @eric-czech
explodes #35 (comment)
We currently ignore non-diseases, by only training and predicting on terms that are diseases as per
get_disease_nodes.Our training labels only apply to diseases. Therefore, I think it makes sense to continue training only on diseases. However, there is the possibility that we could:
is_diseasemarker column that is part of the outputWhile predictions on non-diseases would likely be of lower quality due to the lack of training coverage, many of the same concepts of grouping terms versus more specific terms would still apply. Users could then decide to discard all predictions when
is_diseaseis False to continue with the current behavior.There could be a benefit to having precision predictions for non-diseases. For example, classifications of symptoms (one example being pain) would make sense along a precision axis:
@yonromai: I'll bring this up with the data team at our next meeting, so no need to do anything until then. CC @eric-czech