.. automodapi:: curies.w3c
:no-inheritance-diagram:
:no-heading:
:include-all-objects:
Opting in to W3C Validation with a :class:`curies.Converter`
In practice, some usages do not conform to these standards, often due to encoding things that aren't really supposed to be CURIEs, such as like SMILES strings for molecules, UCUM codes for units, or other language-like "identifiers".
Therefore, it's on the roadmap for the curies package to support operations for
validating against the W3C standards and mapping between "loose" (i.e., un-URL-encoded)
and strict (i.e., URL-encoded) CURIEs and IRIs. In practice, this will often solve
issues with special characters like square brackets ([ and ]).
looseCURIE <-> strictCURIE
^. \./. ^
| X |
v / \. v
looseURI <-> strictURI
A first step towards accomplishing this was implemented in
#104 by adding a w3c_validate flag to
both the initialization of a :mod:`curies.Converter` as well as in the
:meth:`curies.Converter.expand` function.
Here's an example of using W3C validation during expansion:
import curies
converter = curies.Converter.from_prefix_map({
"smiles": "https://bioregistry.io/smiles:",
})
>>> converter.expand("smiles:CC(=O)NC([H])(C)C(=O)O")
https://bioregistry.io/smiles:CC(=O)NC([H])(C)C(=O)O
>>> converter.expand("smiles:CC(=O)NC([H])(C)C(=O)O", w3c_mode=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cthoyt/dev/curies/src/curies/api.py", line 1362, in expand
raise W3CValidationError(f"CURIE is not valid under W3C spec: {curie}")
W3CValidationError: CURIE is not valid under W3C spec: smiles:CC(=O)NC([H])(C)C(=O)O
This can also be used to extend
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cthoyt/dev/curies/src/curies/api.py", line 1362, in expand
raise W3CValidationError(f"CURIE is not valid under W3C spec: {curie}")
W3CValidationError: CURIE is not valid under W3C spec: smiles:CC(=O)NC([H])(C)C(=O)O
This can also be used to extend :meth:`curies.Converter.is_curie`
import curies
converter = curies.Converter.from_prefix_map({
"smiles": "https://bioregistry.io/smiles:",
})
>>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O")
True
>>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O", w3c_mode=True)
False
Finally, this can be used during instantiation of a converter:
converter = curies.Converter.from_prefix_map({
"smiles": "https://bioregistry.io/smiles:",
})
>>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O")
True
>>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O", w3c_mode=True)
False
Finally, this can be used during instantiation of a converter:
converter = curies.Converter.from_prefix_map({
"smiles": "https://bioregistry.io/smiles:",
})
>>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O")
True
>>> converter.is_curie("smiles:CC(=O)NC([H])(C)C(=O)O", w3c_validate=True)
False
Finally, this can be used during instantiation of a converter:
import curies
>>> curies.Converter.from_prefix_map(
... {"4dn.biosource": "https://data.4dnucleome.org/biosources/"},
... w3c_validate=True,
... )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cthoyt/dev/curies/src/curies/api.py", line 816, in from_prefix_map
return cls(
^^^^
File "/Users/cthoyt/dev/curies/src/curies/api.py", line 527, in __init__
raise W3CValidationError(f"Records not conforming to W3C:\n\n{msg}")
curies.api.W3CValidationError: Records not conforming to W3C:
- Record(prefix='4dn.biosource', uri_prefix='https://data.4dnucleome.org/biosources/', prefix_synonyms=[], uri_prefix_synonyms=[], pattern=None)
.. seealso::
1. Discussion on the ``curies`` issue tracker about handling CURIEs that include
e.g. square brackets and therefore don't conform to the W3C specification:
https://github.com/biopragmatics/curies/issues/103
2. Discussion on languages that shouldn't really get encoded in CURIEs, but still
do: https://github.com/biopragmatics/bioregistry/issues/460
3. Related to (2) - discussion on how to properly encode UCUM in CURIEs:
https://github.com/biopragmatics/bioregistry/issues/648