-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Biological process (GO/BP) terms used in disorder YAML files are not being validated against the Gene Ontology because BiologicalProcessDescriptor is missing the bindings block that connects its term.id slot to the BiologicalProcessTerm dynamic enum.
Root Cause
In src/dismech/schema/dismech.yaml, the linkml-term-validator relies on bindings in each descriptor class to know which fields to validate against which ontology-constrained term enum.
CellTypeDescriptor and GeneDescriptor are correctly configured (lines ~1929 and ~1961):
CellTypeDescriptor:
is_a: Descriptor
slot_usage:
term:
bindings:
- binds_value_of: id
range: CellTypeTerm
obligation_level: REQUIREDBiologicalProcessDescriptor is missing the bindings block (lines ~1940–1945):
BiologicalProcessDescriptor:
is_a: Descriptor
description: A descriptor for biological processes, bindable to Gene Ontology (GO)
slot_usage:
term:
description: Optional GO biological process term reference
# ← No bindings here\!The BiologicalProcessTerm enum is defined with the correct reachable_from constraint (GO:0008150), but it is never referenced by BiologicalProcessDescriptor, so the validator never applies it.
Affected Descriptor Classes
The following descriptor classes are missing bindings and therefore do not have their ontology terms validated:
| Class | Missing binding to | Ontology |
|---|---|---|
BiologicalProcessDescriptor |
BiologicalProcessTerm |
GO (biological_process namespace) |
AnatomicalEntityDescriptor |
AnatomicalEntityTerm |
UBERON |
ChemicalEntityDescriptor |
ChemicalEntityTerm |
CHEBI |
CellularComponentDescriptor |
(no CellularComponentTerm defined) | GO (cellular_component) |
ProteinComplexDescriptor |
(no ProteinComplexTerm defined) | GO |
OAK Config Status
The conf/oak_config.yaml is correctly configured — GO is mapped to sqlite:obo:go. The problem is purely in the schema's missing binding constraints, not the OAK config.
Proposed Fix
Add bindings to BiologicalProcessDescriptor (and the other affected descriptors):
BiologicalProcessDescriptor:
is_a: Descriptor
description: A descriptor for biological processes, bindable to Gene Ontology (GO)
slot_usage:
term:
description: Optional GO biological process term reference
bindings:
- binds_value_of: id
range: BiologicalProcessTerm
obligation_level: REQUIREDSimilarly, AnatomicalEntityDescriptor and ChemicalEntityDescriptor need bindings added for AnatomicalEntityTerm and ChemicalEntityTerm respectively.
For CellularComponentDescriptor and ProteinComplexDescriptor, a corresponding CellularComponentTerm (reachable from GO:0005575) and ProteinComplexTerm (reachable from GO:0032991 or similar) would need to be defined first.
Impact
- All 55+ disorder YAML files with
biological_processesfields could contain invalid or hallucinated GO term IDs/labels with no validation error raised - Terms outside the biological_process namespace (e.g., molecular function or cellular component GO terms accidentally used in
biological_processes) would not be caught
Linked Issue
Raised on #454.