Skip to content

Add download_jsonld_context field for JSON-LD context downloads#1942

Open
jdsika wants to merge 6 commits into
biopragmatics:mainfrom
jdsika:feat/download-jsonld
Open

Add download_jsonld_context field for JSON-LD context downloads#1942
jdsika wants to merge 6 commits into
biopragmatics:mainfrom
jdsika:feat/download-jsonld

Conversation

@jdsika
Copy link
Copy Markdown
Contributor

@jdsika jdsika commented May 4, 2026

Summary

Adds a new download_jsonld_context field to the Resource model for registering direct download links to JSON-LD context files published by ontologies. This follows the exact pattern established by download_jskos.

Motivation

This was discussed in #1939 — the Gaia-X ontology publishes a JSON-LD context at https://registry.lab.gaia-x.eu/development/context/development, but there was no appropriate field to capture it. download_json is specifically for OBO Graph JSON. @cthoyt suggested a follow-up (comment):

if you want to make a follow-up to add more unstructured information (e.g., link to the JSON-LD context) into the description, please feel free. For now, there is no field to capture that kind of thing

Rather than burying it in the description, this PR adds first-class support for it.

Changes

File Change
schema/struct.py New download_jsonld field + get_download_jsonld() getter with doctest
resolve.py New get_jsonld_download() public resolver + __all__ entry
__init__.py Re-export get_jsonld_download
app/ui.py Pass jsonld_download to template
app/templates/resource.html JSON-LD download badge + condition guard
schema/schema.json Auto-regenerated
data/bioregistry.json Example: gx (Gaia-X) entry with JSON-LD context URL

Design Decisions

  • Plain str | None type (like download_jskos), not str | AnnotatedURL | None — JSON-LD contexts don't have format variants
  • Not included in _downloads() / has_download() — consistent with download_skos and download_jskos which are also excluded from OLS config eligibility
  • Doctest uses gx as the example prefix

References

@jdsika jdsika force-pushed the feat/download-jsonld branch from b285b66 to 1f12035 Compare May 4, 2026 09:14
@cthoyt
Copy link
Copy Markdown
Member

cthoyt commented May 4, 2026

Is this typical usage of JSON-LD, or an idiosyncrasy of LinkML?

If it's just a LinkML-specific abuse of JSON-LD (which can store anything), then I would not be so supportive

Can you give an example on how to parse a JSON-LD file to retrieve a vocabulary back out of it? Do you have some examples of this being used outside of LinkML-driven repositories?

@jdsika jdsika force-pushed the feat/download-jsonld branch from 1f12035 to 5e8824b Compare May 4, 2026 09:45
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.92%. Comparing base (8950e70) to head (36ff79e).
⚠️ Report is 1091 commits behind head on main.

Files with missing lines Patch % Lines
src/bioregistry/resolve.py 20.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1942      +/-   ##
==========================================
+ Coverage   42.51%   44.92%   +2.41%     
==========================================
  Files         117      142      +25     
  Lines        8327    10639    +2312     
  Branches     1963     1856     -107     
==========================================
+ Hits         3540     4780    +1240     
- Misses       4582     5496     +914     
- Partials      205      363     +158     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add a new download_jsonld_context field to the Resource model, following
the existing download_jskos pattern. This enables registering direct
download links for JSON-LD context files published by ontologies.

The field is named download_jsonld_context (not download_jsonld) to
clearly distinguish it from a full vocabulary serialization: a JSON-LD
context provides term-to-IRI mappings and type coercion rules for
compact instance data, not the ontology itself.

Includes:
- Field definition and getter with doctest on struct.py
- Public resolver function get_jsonld_context_download in resolve.py
- Re-export in __init__.py
- Web UI badge rendering in resource.html
- Template variable pass-through in ui.py
- Regenerated schema.json
- Example data entry for the gx (Gaia-X) prefix

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
@jdsika jdsika force-pushed the feat/download-jsonld branch from 5e8824b to b19d0d0 Compare May 4, 2026 12:32
@jdsika
Copy link
Copy Markdown
Contributor Author

jdsika commented May 4, 2026

Hi @cthoyt, I had a lengthy back and forth discussion with myself summarized below. I did rename to make the purpose of the specific JSON-LD file clearer. What do you think about it?


Summary

Publishing standalone JSON-LD context files at stable URLs is standard W3C practice, not a LinkML idiosyncrasy. The most prominent examples — none of which use LinkML:

Vocabulary Context URL Maintained by
schema.org https://schema.org/docs/jsonldcontext.json schema.org community
W3C Verifiable Credentials https://www.w3.org/2018/credentials/v1 W3C VCWG
W3C ActivityStreams 2.0 https://www.w3.org/ns/activitystreams W3C Social Web WG
W3C DID Core https://www.w3.org/ns/did/v1 W3C DID WG
W3C Security Vocabulary https://w3id.org/security/v2 W3C Credentials CG

These are core W3C Recommendations. JSON-LD contexts are part of the JSON-LD 1.1 specification (§3.1).

What a JSON-LD context does

A JSON-LD context provides two things:

  1. Term mapping: short names → full IRIs (e.g. "issuer""https://www.w3.org/2018/credentials#issuer")
  2. Type coercion: declares the expected datatype (e.g. "issuanceDate": {"@id": "cred:issuanceDate", "@type": "xsd:dateTime"})

This allows instance data to be written as plain, readable JSON while remaining fully interpretable as RDF. For example, the W3C Verifiable Credentials context declares:

"issuanceDate": {"@id": "cred:issuanceDate", "@type": "xsd:dateTime"},
"issuer": {"@id": "cred:issuer", "@type": "@id"},
"holder": {"@id": "cred:holder", "@type": "@id"}

So a credential can be written as "issuanceDate": "2024-01-01T00:00:00" instead of {"@value": "2024-01-01T00:00:00", "@type": "xsd:dateTime"}.

The Gaia-X case

The Gaia-X context happens to be generated by LinkML, but the output is a standard JSON-LD @context document using only standard JSON-LD 1.1 features (@id, @type, @vocab, @context nesting). The "comments" field with "Auto generated by LinkML..." sits outside the @context key and doesn't affect JSON-LD processing.

How to parse it

Any standard JSON-LD processor can use the context:

from pyld import jsonld

doc = {
    "@context": "https://registry.lab.gaia-x.eu/development/context/development",
    "@type": "gx:LegalPerson",
    "gx:legalName": "Example Corp",
    "gx:headquarterAddress": {
        "@type": "gx:Address",
        "gx:countrySubdivisionCode": "DE-BY"
    }
}

# Expand: resolves all terms to full IRIs
expanded = jsonld.expand(doc)

# Convert to RDF:
rdf = jsonld.to_rdf(doc, {'format': 'application/n-quads'})

With rdflib:

from rdflib import Graph
g = Graph()
g.parse(data=json.dumps(doc), format="json-ld")
# Query with SPARQL, serialize to Turtle, etc.

Personal motivation: compact JSON-LD in practice

Beyond the Gaia-X core context, I also maintain ASCS-eV/ontology-management-base — 21 domain ontologies for the ENVITED-X automotive simulation ecosystem built on top of Gaia-X. Each domain publishes an OWL + SHACL + JSON-LD context triad. The contexts for 20 of these domains are generated by a custom SHACL-based generator, not LinkML — they extract sh:datatype constraints from SHACL shapes and emit standard @type coercion rules.

The practical benefit: instance data for automotive simulation assets can be written as compact JSON:

{
   "@context": [
      "https://w3id.org/gaia-x/development#",
      "https://w3id.org/ascs-ev/envited-x/hdmap/v6/"
   ],
   "@type": "hdmap:HdMap",
   "hdmap:hasQuantity": {
      "@type": "hdmap:Quantity",
      "length": 1.46,
      "numberIntersections": 5
   }
}

Without the context, each float would require {"@value": 1.46, "@type": "xsd:float"}. The context declares "length": {"@id": "hdmap:length", "@type": "xsd:float"}, so plain JSON numbers just work. This is exactly the same pattern W3C uses for xsd:dateTime in Verifiable Credentials.

Rename: download_jsonlddownload_jsonld_context

I renamed the field to download_jsonld_context to make the distinction clear: this is not a full vocabulary serialization (like download_owl), but a companion artifact that enables compact JSON-LD serialization of instance data. The badge now reads "JSON-LD Context".

@jdsika jdsika changed the title Add download_jsonld field for JSON-LD context downloads Add download_jsonld_context field for JSON-LD context downloads May 6, 2026
@jdsika
Copy link
Copy Markdown
Contributor Author

jdsika commented May 7, 2026

what do you think @cthoyt ?

@cthoyt
Copy link
Copy Markdown
Member

cthoyt commented May 7, 2026

@jdsika I'm at about 80% reject, 20% accept. Still thinking through if this is in scope and if I would want to take on the burden of maintaining code around this

I'm still not convinced that using the contexts themselves is a typical way of communicating the contents of a controlled vocabulary. Why not just use RDF encoded in JSON-LD directly, instead of getting even more meta? I've never seen this pattern before.

@jdsika
Copy link
Copy Markdown
Contributor Author

jdsika commented May 7, 2026

Hey @cthoyt — thank you for taking the time to think this through. I genuinely appreciate it.

You're right that a JSON-LD context is a "helper artifact" — it's derived from the normative vocabulary, not the vocabulary itself. I fully acknowledge that distinction. My argument is purely practical: many projects publish these helpers at stable URLs because they're what developers actually dereference at runtime to write compact JSON-LD instance data. Schema.org, W3C Verifiable Credentials, ActivityStreams, and DID Core all do this — not because the context is the vocabulary, but because it's the artifact users need to use the vocabulary in JSON-LD.

For Gaia-X specifically, the context URL is the most practical stable entry point for developers working with the ecosystem.

That said, I understand if you feel this is out of scope for the Bioregistry's mission of cataloging vocabularies themselves. No hard feelings either way.


Below is a summary of the conversation I had with my agent about this topic, for transparency:

  • What cthoyt suggested: Link to the vocabulary itself serialized as JSON-LD (actual RDF triples — classes, properties, axioms), not to the abbreviation-mapping helper. The Bioregistry already has download_owl for this purpose.
  • Why LinkML generates contexts: It's a developer ergonomics tool — you define your model once, and LinkML outputs multiple artifacts (JSON Schema, OWL, SHACL, JSON-LD context). The context enables instance data to look like plain JSON while remaining RDF-interpretable.
  • Where contexts are standardized: W3C JSON-LD 1.1 §3.1 defines the format and consumption of contexts. However, there's no normative requirement that every ontology must publish one — it's a common publishing convention, not a spec mandate.
  • The core tension: A context is a serialization convenience, not the vocabulary. The Bioregistry catalogs vocabularies. Both positions are defensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants