Skip to content

Support JSON Schema validation workflows (biovalidator-style custom keywords) #11

@cmungall

Description

@cmungall

linkml-term-validator currently supports two usage patterns:

  • Standalone CLI
  • LinkML validator plugin

However, some users need to work within JSON Schema validation workflows—either because they author JSON Schema natively or because downstream tools require JSON Schema validators. These users still need ontology term validation (dynamic enums, reachable_from constraints, etc.).

Current Options

Option 1: Materialize to static JSON Schema

Already well-supported via linkml-term-validator or vskit. See [Dynamic Enums documentation](https://linkml.io/linkml/schemas/enums.html#dynamic-enums).

Pros:

  • Works with any JSON Schema validator
  • No custom tooling required at validation time

Cons:

  • Produces large, unwieldy schemas (especially for broad constraints like "any descendant of GO:0008150")
  • Requires regeneration when ontologies update
  • Loses the semantic intent of the constraint

Option 2: JSON Schema Vocabularies (Draft 2019-09+)

The json-schema-vocabularies mechanism allows formal extension of JSON Schema with custom keywords.

Pros:

  • Standards-compliant approach
  • Clean separation of concerns

Cons:

  • Limited validator support in practice
  • Most validators only partially implement vocabulary features
  • Adds complexity for end users

Option 3: Custom keywords with specialized validator (HCA/biovalidator pattern)

The Human Cell Atlas project defined a graph_restriction keyword (see [linkml/linkml#274](linkml/linkml#274)) and ELIXIR maintains [biovalidator](https://github.com/elixir-europe/biovalidator) to interpret it:

"ontology": {
  "type": "string",
  "graph_restriction": {
    "ontologies": ["obo:go"],
    "classes": ["GO:0007049"],
    "relations": ["rdfs:subClassOf"],
    "direct": false,
    "include_self": false
  }
}

This is semantically equivalent to LinkML's reachable_from:

reachable_from:
  source_ontology: obo:go
  source_nodes: [GO:0007049]
  relationship_types: [rdfs:subClassOf]
  include_self: false

Pros:

  • Pragmatic, proven in production (HCA, ELIXIR)
  • Schema remains human-readable with semantic intent preserved
  • Standard validators ignore custom keywords (graceful degradation)

Cons:

  • Requires biovalidator or compatible tooling
  • Not a formal standard (though widely adopted in life sciences)

Proposal / Discussion

Should linkml-term-validator support the biovalidator ecosystem? Possible directions:

  1. Generate graph_restriction keywords when converting LinkML → JSON Schema (alternative to materialization)
  2. Provide a biovalidator-compatible mode or adapter
  3. Document interoperability with biovalidator for users who need JSON Schema workflows
  4. Contribute upstream to biovalidator to align keyword semantics with LinkML

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions