Skip to content

Add option to include obsolete terms in reachable_from expansions #27

@cmungall

Description

@cmungall

Problem

When using a dynamic enum to validate term IDs in historical annotation data (e.g. reviewing existing GO annotations), the enum should accept both current and obsolete terms from the target ontology branch. Currently, reachable_from uses OAK's descendants(), which excludes obsolete terms because obsoletion removes the rdfs:subClassOf edge.

Concrete use case: ai4curation/ai-gene-review

Our schema has an ExistingAnnotation class representing GO annotations from the GOA database that a curator is reviewing. These may use terms that are now obsolete — the whole point of the review workflow is to mark such annotations for REMOVE or MODIFY action.

We'd like to bind a dynamic enum to ExistingAnnotation.term.id that:

  • Accepts all current descendants of GO:0003674, GO:0008150, GO:0005575
  • Also accepts obsolete GO terms (since historical annotations may reference them)

With the current schema, this isn't possible:

GOTermEnum:
  reachable_from:
    source_nodes: [GO:0003674, GO:0008150, GO:0005575]
    is_direct: false
    relationship_types: [rdfs:subClassOf]

returns ~40K current terms, but ~13K obsolete GO terms are silently excluded. Historical annotations using these get flagged as invalid.

For now we've removed the binding entirely (see ai4curation/ai-gene-review#249), but this loses a useful check — we can no longer verify that ExistingAnnotation.term.id is a valid GO CURIE at all.

Proposal

Add an optional include_obsoletes field (or similar name) to the reachable_from construct:

GOTermEnum:
  reachable_from:
    source_nodes: [GO:0003674, GO:0008150, GO:0005575]
    is_direct: false
    include_obsoletes: true
    relationship_types: [rdfs:subClassOf]

Implementation in _expand_reachable_from:

# Current (exclude obsoletes, via descendants())
descendants_result = adapter.descendants(source_node, predicates=predicates, reflexive=True)
values.update(descendants_result)

# New: also query obsolete terms with matching prefix if include_obsoletes
if getattr(query, "include_obsoletes", False):
    prefix = self._get_prefix(source_node)
    for curie, is_obs in adapter.entities(filter_obsoletes=False):
        if is_obs and curie.startswith(f"{prefix}:"):
            values.add(curie)

Or via a direct sql query if the adapter is a SqlImplementation:

SELECT DISTINCT subject FROM statements 
WHERE predicate = 'owl:deprecated'
AND subject LIKE 'GO:%'

Alternative: query replaced_by

Some projects may want "obsolete terms whose replacement is under this branch" rather than "all obsolete terms with this prefix". A more sophisticated form would be:

include_obsoletes: replaced_by_descendant  # only obsoletes whose term-replaced-by chain ends at a descendant of source_nodes

But a simple boolean covers the common case.

Why not use concepts?

We could technically list all ~13K obsolete GO terms as concepts, but:

  1. That's fragile — needs re-generation when new terms are obsoleted
  2. It requires serializing a huge enumeration in the schema
  3. It defeats the purpose of dynamic enums (query at validation time)

Why not use matches with a regex?

matches matches on term syntax, not ontology membership. We'd match any GO:\d+-like string, not just legitimate obsolete terms under the CC/BP/MF roots.

Spec considerations

This would require a corresponding addition to the LinkML metamodel for ReachabilityQuery:

# In linkml meta schema
ReachabilityQuery:
  attributes:
    ...
    include_obsoletes:
      range: boolean
      description: >-
        If True, also include obsolete terms when expanding. Default False.
        Useful when validating historical annotation data that may reference
        deprecated terms.

Happy to open a corresponding linkml/linkml issue if this is better coordinated upstream first. But LTV is the consumer of the reachable_from semantics, so the behavior decision sits here.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions