Skip to content

Commit 6e8c9bc

Browse files
committed
Update tutorial
1 parent d770635 commit 6e8c9bc

1 file changed

Lines changed: 71 additions & 11 deletions

File tree

docs/source/preprocessing.rst

Lines changed: 71 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ context-specific rewrite rules, you can wrap a :class:`curies.Converter` and
66
preprocessing rules encoded in an instance of :class:`curies.PreprocessingRules` inside
77
a :class:`curies.PreprocessingConverter`.
88

9+
Rewrites
10+
--------
11+
912
For example, you always want to fix legacy references to the ``OBO_REL`` namespace:
1013

1114
.. code-block:: python
@@ -15,21 +18,21 @@ For example, you always want to fix legacy references to the ``OBO_REL`` namespa
1518
1619
rules = PreprocessingRules(
1720
rewrites=PreprocessingRewrites(
18-
full={"OBO_REL:is_a": "rdfs:subClassOf"}
19-
)
21+
full={"OBO_REL:is_a": "rdfs:subClassOf"},
22+
),
2023
)
2124
2225
converter = curies.get_obo_converter()
2326
converter = PreprocessingConverter.from_converter(
24-
converter, rules=rules
27+
converter, rules=rules,
2528
)
2629
2730
>>> converter.parse_curie("OBO_REL:is_a")
2831
ReferenceTuple('rdfs', 'subClassOf')
2932
30-
Similarly, there may be a whole class of references that need to be fixed
31-
based on their prefix, such as the ``APOLLO:SV_`` references that are mangled
32-
by the OWLAPI due to the OBO Foundry's PURL rules
33+
Similarly, there may be a whole class of references that need to be fixed based on their
34+
prefix, such as the ``APOLLO:SV_`` references that are mangled by the OWLAPI due to the
35+
OBO Foundry's PURL rules
3336

3437
.. code-block:: python
3538
@@ -38,18 +41,75 @@ by the OWLAPI due to the OBO Foundry's PURL rules
3841
3942
rules = PreprocessingRules(
4043
rewrites=PreprocessingRewrites(
41-
prefix={"APOLLO:SV_": "APOLLO_SV:"}
44+
prefix={"APOLLO:SV_": "APOLLO_SV:"},
4245
)
4346
)
4447
4548
converter = curies.get_obo_converter()
4649
converter = PreprocessingConverter.from_converter(
47-
converter, rules=rules
50+
converter, rules=rules,
4851
)
4952
5053
>>> converter.parse_curie("APOLLO:SV_1234567")
5154
ReferenceTuple('APOLLO_SV', '1234567')
5255
53-
Some rewrite rules only apply to a specific resource, because of its own quirks
54-
in curation or encoding. For example, CHMO encodes OrangeBook entries with ``orange``
55-
as a prefix, which is not typically specific enough to
56+
Some rewrite rules only apply to a specific resource, because of its own quirks in
57+
curation or encoding. For example, CHMO encodes OrangeBook entries with ``orange`` as a
58+
prefix, which is not typically specific enough to warrant curating ``orange`` as a
59+
prefix, e.g., in the Bioregistry
60+
61+
.. code-block:: python
62+
63+
import curies
64+
from curies import PreprocessingRules, PreprocessingConverter, PreprocessingRewrites
65+
66+
rules = PreprocessingRules(
67+
rewrites=PreprocessingRewrites(
68+
resource_prefix={
69+
"CHMO": {"orange:": "orangebook:"},
70+
},
71+
),
72+
)
73+
74+
converter = curies.get_obo_converter()
75+
converter.add_prefix("orangebook", "https://bioregistry.io/orangebook:")
76+
converter = PreprocessingConverter.from_converter(
77+
converter, rules=rules,
78+
)
79+
80+
>>> converter.parse_curie("orange:10.2.1.1.3")
81+
ReferenceTuple('orangebook', '10.2.1.1.3')
82+
83+
Similarly, this can be used to inject knowledge about resources that improperly import
84+
EDAM sub-trees such as MCRO, which uses ``format`` as a prefix where it means
85+
``edam.format``
86+
87+
Blocks
88+
------
89+
90+
Some references are _never_ informative, and can be configured to be thrown away, such
91+
as ``Bgee:curators``, ``BioGRID:curators``, ``GROUP:OBI``, and similar group curation
92+
flags.
93+
94+
.. code-block:: python
95+
96+
import curies
97+
from curies import PreprocessingRules, PreprocessingConverter, PreprocessingBlocklists
98+
99+
rules = PreprocessingRules(
100+
blocklists=PreprocessingBlocklists(
101+
full=["Bgee:curators", "BioGRID:curators", "GROUP:OBI"],
102+
),
103+
)
104+
105+
converter = curies.get_obo_converter()
106+
converter = PreprocessingConverter.from_converter(
107+
converter, rules=rules,
108+
)
109+
110+
# raises a BlocklistError
111+
>>> converter.parse_curie("GROUP:OBI")
112+
113+
Blocklists cause throwing an exception that can be handled by downstream code, such as
114+
returning a None. This is done because in some places, it's nice to have the distinction
115+
between ``None`` being returned by parsing failing, versus actively being blocked.

0 commit comments

Comments
 (0)