W3C Validation

.. automodapi:: curies.w3c
    :no-inheritance-diagram:
    :no-heading:
    :include-all-objects:

Opting in to W3C Validation with a :class:`curies.Converter`

In practice, some usages do not conform to these standards, often due to encoding things that aren't really supposed to be CURIEs, such as like SMILES strings for molecules, UCUM codes for units, or other language-like "identifiers".

Therefore, it's on the roadmap for the curies package to support operations for validating against the W3C standards and mapping between "loose" (i.e., un-URL-encoded) and strict (i.e., URL-encoded) CURIEs and IRIs. In practice, this will often solve issues with special characters like square brackets ([ and ]).

looseCURIE <-> strictCURIE
     ^.    \./.    ^
     |      X      |
     v     / \.    v
 looseURI  <->  strictURI

A first step towards accomplishing this was implemented in #104 by adding a w3c_validate flag to both the initialization of a :mod:`curies.Converter` as well as in the :meth:`curies.Converter.expand` function.

Here's an example of using W3C validation during expansion:

import curies

converter = curies.Converter.from_prefix_map({
    "smiles": "https://bioregistry.io/smiles:",
})

>>> converter.expand("smiles:CC(=O)NC([H])(C)C(=O)O")
https://bioregistry.io/smiles:CC(=O)NC([H])(C)C(=O)O

>>> converter.expand("smiles:CC(=O)NC([H])(C)C(=O)O", w3c_mode=True)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/cthoyt/dev/curies/src/curies/api.py", line 1362, in expand
        raise W3CValidationError(f"CURIE is not valid under W3C spec: {curie}")
    W3CValidationError: CURIE is not valid under W3C spec: smiles:CC(=O)NC([H])(C)C(=O)O

This can also be used to extend
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cthoyt/dev/curies/src/curies/api.py", line 1362, in expand
    raise W3CValidationError(f"CURIE is not valid under W3C spec: {curie}")
W3CValidationError: CURIE is not valid under W3C spec: smiles:CC(=O)NC([H])(C)C(=O)O

This can also be used to extend :meth:`curies.Converter.is_curie`

import curies

    converter = curies.Converter.from_prefix_map({
        "smiles": "https://bioregistry.io/smiles:",
    })

    >>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O")
    True
    >>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O", w3c_mode=True)
    False

Finally, this can be used during instantiation of a converter:

    converter = curies.Converter.from_prefix_map({
        "smiles": "https://bioregistry.io/smiles:",
    })

    >>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O")
    True
    >>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O", w3c_mode=True)
    False

Finally, this can be used during instantiation of a converter:

converter = curies.Converter.from_prefix_map({
    "smiles": "https://bioregistry.io/smiles:",
})

>>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O")
True
>>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O", w3c_validate=True)
False

Finally, this can be used during instantiation of a converter:

import curies

>>> curies.Converter.from_prefix_map(
...     {"4dn.biosource": "https://data.4dnucleome.org/biosources/"},
...     w3c_validate=True,
... )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cthoyt/dev/curies/src/curies/api.py", line 816, in from_prefix_map
    return cls(
           ^^^^
  File "/Users/cthoyt/dev/curies/src/curies/api.py", line 527, in __init__
    raise W3CValidationError(f"Records not conforming to W3C:\n\n{msg}")
curies.api.W3CValidationError: Records not conforming to W3C:

  - Record(prefix='4dn.biosource', uri_prefix='https://data.4dnucleome.org/biosources/', prefix_synonyms=[], uri_prefix_synonyms=[], pattern=None)

.. seealso::

    1. Discussion on the ``curies`` issue tracker about handling CURIEs that include
       e.g. square brackets and therefore don't conform to the W3C specification:
       https://github.com/biopragmatics/curies/issues/103
    2. Discussion on languages that shouldn't really get encoded in CURIEs, but still
       do: https://github.com/biopragmatics/bioregistry/issues/460
    3. Related to (2) - discussion on how to properly encode UCUM in CURIEs:
       https://github.com/biopragmatics/bioregistry/issues/648

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

W3C Validation

Opting in to W3C Validation with a :class:`curies.Converter`

Uh oh!

FilesExpand file tree

w3c.rst

Latest commit

History

w3c.rst

File metadata and controls

W3C Validation

Opting in to W3C Validation with a :class:`curies.Converter`