Skip to content

Missing binding constraints in BiologicalProcessDescriptor (and others) prevent GO term validation #463

@github-actions

Description

@github-actions

Summary

Biological process (GO/BP) terms used in disorder YAML files are not being validated against the Gene Ontology because BiologicalProcessDescriptor is missing the bindings block that connects its term.id slot to the BiologicalProcessTerm dynamic enum.

Root Cause

In src/dismech/schema/dismech.yaml, the linkml-term-validator relies on bindings in each descriptor class to know which fields to validate against which ontology-constrained term enum.

CellTypeDescriptor and GeneDescriptor are correctly configured (lines ~1929 and ~1961):

CellTypeDescriptor:
  is_a: Descriptor
  slot_usage:
    term:
      bindings:
        - binds_value_of: id
          range: CellTypeTerm
          obligation_level: REQUIRED

BiologicalProcessDescriptor is missing the bindings block (lines ~1940–1945):

BiologicalProcessDescriptor:
  is_a: Descriptor
  description: A descriptor for biological processes, bindable to Gene Ontology (GO)
  slot_usage:
    term:
      description: Optional GO biological process term reference
      # ← No bindings here\!

The BiologicalProcessTerm enum is defined with the correct reachable_from constraint (GO:0008150), but it is never referenced by BiologicalProcessDescriptor, so the validator never applies it.

Affected Descriptor Classes

The following descriptor classes are missing bindings and therefore do not have their ontology terms validated:

Class Missing binding to Ontology
BiologicalProcessDescriptor BiologicalProcessTerm GO (biological_process namespace)
AnatomicalEntityDescriptor AnatomicalEntityTerm UBERON
ChemicalEntityDescriptor ChemicalEntityTerm CHEBI
CellularComponentDescriptor (no CellularComponentTerm defined) GO (cellular_component)
ProteinComplexDescriptor (no ProteinComplexTerm defined) GO

OAK Config Status

The conf/oak_config.yaml is correctly configured — GO is mapped to sqlite:obo:go. The problem is purely in the schema's missing binding constraints, not the OAK config.

Proposed Fix

Add bindings to BiologicalProcessDescriptor (and the other affected descriptors):

BiologicalProcessDescriptor:
  is_a: Descriptor
  description: A descriptor for biological processes, bindable to Gene Ontology (GO)
  slot_usage:
    term:
      description: Optional GO biological process term reference
      bindings:
        - binds_value_of: id
          range: BiologicalProcessTerm
          obligation_level: REQUIRED

Similarly, AnatomicalEntityDescriptor and ChemicalEntityDescriptor need bindings added for AnatomicalEntityTerm and ChemicalEntityTerm respectively.

For CellularComponentDescriptor and ProteinComplexDescriptor, a corresponding CellularComponentTerm (reachable from GO:0005575) and ProteinComplexTerm (reachable from GO:0032991 or similar) would need to be defined first.

Impact

  • All 55+ disorder YAML files with biological_processes fields could contain invalid or hallucinated GO term IDs/labels with no validation error raised
  • Terms outside the biological_process namespace (e.g., molecular function or cellular component GO terms accidentally used in biological_processes) would not be caught

Linked Issue

Raised on #454.

Metadata

Metadata

Labels

bugSomething isn't workingschema

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions