Skip to content

Add canonical $id fields to every Beacon v2 JSON/YAML schema #235

@M-casado

Description

@M-casado

Context
• Follow-up to the discussion in PR #232 (“consistent relative framework $ref in endpoints”).
• Root-cause analysis in Issue #206 shows that missing $id values break Biovalidator/AJV resolution.
• A working prototype exists in M-casado#1 (where I injected $ids automatically).


Problem statement

Beacon v2 schemas currently mix absolute and relative $refs without defining canonical $ids for each schema document.
JSON-Schema Draft 2020-12 (§8.2.1.1) recommends that every root schema provides an absolute $id, and validators such as AJV treat this $id as the base-URI for resolving further $refs.
Lack of $id values leads to:

  • Broken resolution chains when model ↔ framework schemas reference each other.
  • Inability to rely on Biovalidator, AJV and other off-the-shelf tooling.
  • Hidden coupling to the main branch through hard-coded raw-GitHub URLs.

Proposal

  1. Inject a canonical $id into every JSON and YAML schema under

    • framework/json/**, framework/src/**
    • models/json/**, models/src/**
  2. Canonical URI format

    • Use https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/<branch>/<path-to-file>
    • MUST be main on release and the current branch for PR artefacts.
    • Fragments (#...) are not included in the $id itself.
  3. Tooling

    • Adapt the idAddition.zip script (see PR Addition of "$id" to model and framework schemas M-casado/beacon-v2#1).
    • (Optional, but honestly... very recommended) Add a GitHub Actions workflow that:
      • Fails if any schema lacks $id
      • Fails on duplicate or non-absolute $ids
      • Checks if the 'repo owner' (e.g., ga4gh-beacon), the 'branch' (e.g., main), and the path to file match the expected. For example, to avoid that there's a fork or another branch, with the URI pointing to a different branch.

Expected benefits

  • Full compatibility with Biovalidator, AJV ≥ 8.12, Swagger-UI and other tooling.
  • Alignment with GA4GH standards such as VRS that already publish $idd schemas (see example: https://w3id.org/ga4gh/schema/vrs/2.0.0/json/Allele).
  • Avoidance of branch-mixing bugs and 404s caused by raw-GitHub absolute $refs.
  • Clearer provenance and easier schema versioning.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions