|
| 1 | +Converter with Preprocessing |
| 2 | +============================ |
| 3 | + |
| 4 | +When simple expansion and contraction aren't enough, and you want to inject global or |
| 5 | +context-specific rewrite rules, you can wrap a :class:`curies.Converter` and |
| 6 | +preprocessing rules encoded in an instance of :class:`curies.PreprocessingRules` inside |
| 7 | +a :class:`curies.PreprocessingConverter`. |
| 8 | + |
| 9 | +Rewrites |
| 10 | +-------- |
| 11 | + |
| 12 | +For example, you always want to fix legacy references to the ``OBO_REL`` namespace: |
| 13 | + |
| 14 | +.. code-block:: python |
| 15 | +
|
| 16 | + import curies |
| 17 | + from curies import PreprocessingRules, PreprocessingConverter, PreprocessingRewrites |
| 18 | +
|
| 19 | + rules = PreprocessingRules( |
| 20 | + rewrites=PreprocessingRewrites( |
| 21 | + full={"OBO_REL:is_a": "rdfs:subClassOf"}, |
| 22 | + ), |
| 23 | + ) |
| 24 | +
|
| 25 | + converter = curies.get_obo_converter() |
| 26 | + converter = PreprocessingConverter.from_converter( |
| 27 | + converter, rules=rules, |
| 28 | + ) |
| 29 | +
|
| 30 | + >>> converter.parse_curie("OBO_REL:is_a") |
| 31 | + ReferenceTuple('rdfs', 'subClassOf') |
| 32 | +
|
| 33 | +Similarly, there may be a whole class of references that need to be fixed based on their |
| 34 | +prefix, such as the ``APOLLO:SV_`` references that are mangled by the OWLAPI due to the |
| 35 | +OBO Foundry's PURL rules |
| 36 | + |
| 37 | +.. code-block:: python |
| 38 | +
|
| 39 | + import curies |
| 40 | + from curies import PreprocessingRules, PreprocessingConverter, PreprocessingRewrites |
| 41 | +
|
| 42 | + rules = PreprocessingRules( |
| 43 | + rewrites=PreprocessingRewrites( |
| 44 | + prefix={"APOLLO:SV_": "APOLLO_SV:"}, |
| 45 | + ) |
| 46 | + ) |
| 47 | +
|
| 48 | + converter = curies.get_obo_converter() |
| 49 | + converter = PreprocessingConverter.from_converter( |
| 50 | + converter, rules=rules, |
| 51 | + ) |
| 52 | +
|
| 53 | + >>> converter.parse_curie("APOLLO:SV_1234567") |
| 54 | + ReferenceTuple('APOLLO_SV', '1234567') |
| 55 | +
|
| 56 | +The CURIE and URI rewrites are unified. Therefore, you can also use a URI as a rewrite, |
| 57 | +such as handling Creative Commons license URLs, which unfortunately aren't themselves |
| 58 | +part of a semantic space for licenses. Luckily, SPDX is, and we can remap to that. |
| 59 | + |
| 60 | +.. code-block:: python |
| 61 | +
|
| 62 | + import curies |
| 63 | + from curies import PreprocessingRules, PreprocessingConverter, PreprocessingRewrites |
| 64 | +
|
| 65 | + rules = PreprocessingRules( |
| 66 | + rewrites=PreprocessingRewrites( |
| 67 | + full={"http://creativecommons.org/licenses/by/3.0/": "spdx:CC-BY-3.0",}, |
| 68 | + ) |
| 69 | + ) |
| 70 | +
|
| 71 | + converter = curies.get_obo_converter() |
| 72 | + converter.add_prefix("spdx", "https://spdx.org/licenses/") |
| 73 | + converter = PreprocessingConverter.from_converter( |
| 74 | + converter, rules=rules, |
| 75 | + ) |
| 76 | +
|
| 77 | + >>> converter.parse_uri("http://creativecommons.org/licenses/by/3.0/") |
| 78 | + ReferenceTuple('spdx', 'CC-BY-3.0') |
| 79 | +
|
| 80 | +Some rewrite rules only apply to a specific resource, because of its own quirks in |
| 81 | +curation or encoding. For example, CHMO encodes OrangeBook entries with ``orange`` as a |
| 82 | +prefix, which is not typically specific enough to warrant curating ``orange`` as a |
| 83 | +prefix, e.g., in the Bioregistry |
| 84 | + |
| 85 | +.. code-block:: python |
| 86 | +
|
| 87 | + import curies |
| 88 | + from curies import PreprocessingRules, PreprocessingConverter, PreprocessingRewrites |
| 89 | +
|
| 90 | + rules = PreprocessingRules( |
| 91 | + rewrites=PreprocessingRewrites( |
| 92 | + resource_prefix={ |
| 93 | + "CHMO": {"orange:": "orangebook:"}, |
| 94 | + }, |
| 95 | + ), |
| 96 | + ) |
| 97 | +
|
| 98 | + converter = curies.get_obo_converter() |
| 99 | + converter.add_prefix("orangebook", "https://bioregistry.io/orangebook:") |
| 100 | + converter = PreprocessingConverter.from_converter( |
| 101 | + converter, rules=rules, |
| 102 | + ) |
| 103 | +
|
| 104 | + >>> converter.parse_curie("orange:10.2.1.1.3") |
| 105 | + ReferenceTuple('orangebook', '10.2.1.1.3') |
| 106 | +
|
| 107 | +Similarly, this can be used to inject knowledge about resources that improperly import |
| 108 | +EDAM sub-trees such as MCRO, which uses ``format`` as a prefix where it means |
| 109 | +``edam.format`` |
| 110 | + |
| 111 | +Blocks |
| 112 | +------ |
| 113 | + |
| 114 | +Some references are _never_ informative, and can be configured to be thrown away, such |
| 115 | +as ``Bgee:curators``, ``BioGRID:curators``, ``GROUP:OBI``, and similar group curation |
| 116 | +flags. |
| 117 | + |
| 118 | +.. code-block:: python |
| 119 | +
|
| 120 | + import curies |
| 121 | + from curies import PreprocessingRules, PreprocessingConverter, PreprocessingBlocklists |
| 122 | +
|
| 123 | + rules = PreprocessingRules( |
| 124 | + blocklists=PreprocessingBlocklists( |
| 125 | + full=["Bgee:curators", "BioGRID:curators", "GROUP:OBI"], |
| 126 | + ), |
| 127 | + ) |
| 128 | +
|
| 129 | + converter = curies.get_obo_converter() |
| 130 | + converter = PreprocessingConverter.from_converter( |
| 131 | + converter, rules=rules, |
| 132 | + ) |
| 133 | +
|
| 134 | + # raises a BlocklistError |
| 135 | + >>> converter.parse_curie("GROUP:OBI") |
| 136 | +
|
| 137 | +Blocklists cause throwing an exception that can be handled by downstream code, such as |
| 138 | +returning a None. This is done because in some places, it's nice to have the distinction |
| 139 | +between ``None`` being returned by parsing failing, versus actively being blocked. This |
| 140 | +can be toggled with the ``block_action`` argument. |
0 commit comments