Problem
When using a dynamic enum to validate term IDs in historical annotation data (e.g. reviewing existing GO annotations), the enum should accept both current and obsolete terms from the target ontology branch. Currently, reachable_from uses OAK's descendants(), which excludes obsolete terms because obsoletion removes the rdfs:subClassOf edge.
Our schema has an ExistingAnnotation class representing GO annotations from the GOA database that a curator is reviewing. These may use terms that are now obsolete — the whole point of the review workflow is to mark such annotations for REMOVE or MODIFY action.
We'd like to bind a dynamic enum to ExistingAnnotation.term.id that:
- Accepts all current descendants of GO:0003674, GO:0008150, GO:0005575
- Also accepts obsolete GO terms (since historical annotations may reference them)
With the current schema, this isn't possible:
GOTermEnum:
reachable_from:
source_nodes: [GO:0003674, GO:0008150, GO:0005575]
is_direct: false
relationship_types: [rdfs:subClassOf]
returns ~40K current terms, but ~13K obsolete GO terms are silently excluded. Historical annotations using these get flagged as invalid.
For now we've removed the binding entirely (see ai4curation/ai-gene-review#249), but this loses a useful check — we can no longer verify that ExistingAnnotation.term.id is a valid GO CURIE at all.
Proposal
Add an optional include_obsoletes field (or similar name) to the reachable_from construct:
GOTermEnum:
reachable_from:
source_nodes: [GO:0003674, GO:0008150, GO:0005575]
is_direct: false
include_obsoletes: true
relationship_types: [rdfs:subClassOf]
Implementation in _expand_reachable_from:
# Current (exclude obsoletes, via descendants())
descendants_result = adapter.descendants(source_node, predicates=predicates, reflexive=True)
values.update(descendants_result)
# New: also query obsolete terms with matching prefix if include_obsoletes
if getattr(query, "include_obsoletes", False):
prefix = self._get_prefix(source_node)
for curie, is_obs in adapter.entities(filter_obsoletes=False):
if is_obs and curie.startswith(f"{prefix}:"):
values.add(curie)
Or via a direct sql query if the adapter is a SqlImplementation:
SELECT DISTINCT subject FROM statements
WHERE predicate = 'owl:deprecated'
AND subject LIKE 'GO:%'
Alternative: query replaced_by
Some projects may want "obsolete terms whose replacement is under this branch" rather than "all obsolete terms with this prefix". A more sophisticated form would be:
include_obsoletes: replaced_by_descendant # only obsoletes whose term-replaced-by chain ends at a descendant of source_nodes
But a simple boolean covers the common case.
Why not use concepts?
We could technically list all ~13K obsolete GO terms as concepts, but:
- That's fragile — needs re-generation when new terms are obsoleted
- It requires serializing a huge enumeration in the schema
- It defeats the purpose of dynamic enums (query at validation time)
Why not use matches with a regex?
matches matches on term syntax, not ontology membership. We'd match any GO:\d+-like string, not just legitimate obsolete terms under the CC/BP/MF roots.
Spec considerations
This would require a corresponding addition to the LinkML metamodel for ReachabilityQuery:
# In linkml meta schema
ReachabilityQuery:
attributes:
...
include_obsoletes:
range: boolean
description: >-
If True, also include obsolete terms when expanding. Default False.
Useful when validating historical annotation data that may reference
deprecated terms.
Happy to open a corresponding linkml/linkml issue if this is better coordinated upstream first. But LTV is the consumer of the reachable_from semantics, so the behavior decision sits here.
Related
Problem
When using a dynamic enum to validate term IDs in historical annotation data (e.g. reviewing existing GO annotations), the enum should accept both current and obsolete terms from the target ontology branch. Currently,
reachable_fromuses OAK'sdescendants(), which excludes obsolete terms because obsoletion removes therdfs:subClassOfedge.Concrete use case: ai4curation/ai-gene-review
Our schema has an
ExistingAnnotationclass representing GO annotations from the GOA database that a curator is reviewing. These may use terms that are now obsolete — the whole point of the review workflow is to mark such annotations forREMOVEorMODIFYaction.We'd like to bind a dynamic enum to
ExistingAnnotation.term.idthat:With the current schema, this isn't possible:
returns ~40K current terms, but ~13K obsolete GO terms are silently excluded. Historical annotations using these get flagged as invalid.
For now we've removed the binding entirely (see ai4curation/ai-gene-review#249), but this loses a useful check — we can no longer verify that
ExistingAnnotation.term.idis a valid GO CURIE at all.Proposal
Add an optional
include_obsoletesfield (or similar name) to thereachable_fromconstruct:Implementation in
_expand_reachable_from:Or via a direct
sqlquery if the adapter is aSqlImplementation:Alternative: query replaced_by
Some projects may want "obsolete terms whose replacement is under this branch" rather than "all obsolete terms with this prefix". A more sophisticated form would be:
But a simple boolean covers the common case.
Why not use
concepts?We could technically list all ~13K obsolete GO terms as
concepts, but:Why not use
matcheswith a regex?matchesmatches on term syntax, not ontology membership. We'd match anyGO:\d+-like string, not just legitimate obsolete terms under the CC/BP/MF roots.Spec considerations
This would require a corresponding addition to the LinkML metamodel for
ReachabilityQuery:Happy to open a corresponding linkml/linkml issue if this is better coordinated upstream first. But LTV is the consumer of the
reachable_fromsemantics, so the behavior decision sits here.Related