Skip to content

Commit e81d986

Browse files
authored
Map Iconclass to ChEBI, OBI, and CHMO (#205)
This PR adds three curation scenarios: 1. Embedding-based matches between Iconclass and CHMO 2. Embedding-based matches between Iconclass and OBI 3. NER-based matches between Iconclass and ChEBI (e.g., find ChEBI entities appearing as part of the names in Iconclass). This is because the embeddings are very far off, and most of the terms don't have exact matches. In all three scenarios, I used skos:relatedMatch because Iconclass is about the depiction of things, and CHMO/OBI/ChEBI are about the things themselves. In general, the predictions are quite poor quality so I only included positive and negative curations but not the potentially thousands of low quality predictions. I focused ChEBI curation around entities that had metals or atoms in them. For CHMO and OBI, I was a bit more exhaustive.
1 parent 237b792 commit e81d986

4 files changed

Lines changed: 184 additions & 1 deletion

File tree

scripts/culture_to_chemistry.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
"""Generate a bridge between cultural heritage and chemical vocabularies."""
2+
3+
import click
4+
from more_click import verbose_option
5+
6+
__all__ = ["main"]
7+
8+
9+
@click.group()
10+
def main():
11+
"""Run culture to chemistry workflows."""
12+
13+
14+
@main.command(name="chmo")
15+
def match_chmo():
16+
"""Get embedding matches to CHMO."""
17+
from pyobo.struct.vocabulary import related_match
18+
19+
from biomappings.lexical import lexical_prediction_cli
20+
21+
lexical_prediction_cli(
22+
__file__,
23+
"iconclass",
24+
"chmo",
25+
predicate=related_match,
26+
method="embedding",
27+
)
28+
29+
30+
@main.command(name="chmo")
31+
def match_obi():
32+
"""Get embedding matches to OBI."""
33+
from pyobo.struct.vocabulary import related_match
34+
35+
from biomappings.lexical import lexical_prediction_cli
36+
37+
lexical_prediction_cli(
38+
__file__,
39+
"iconclass",
40+
"obi",
41+
predicate=related_match,
42+
method="embedding",
43+
)
44+
45+
46+
@main.command(name="chebi")
47+
@verbose_option
48+
def match_chebi() -> None:
49+
"""Get embedding matches to ChEBI."""
50+
from pyobo.struct.vocabulary import related_match
51+
52+
from biomappings import SemanticMapping
53+
from biomappings.lexical import lexical_prediction_cli
54+
55+
def _custom_filter(m: SemanticMapping) -> bool:
56+
return len(m.subject.name) < 60
57+
58+
lexical_prediction_cli(
59+
__file__,
60+
"iconclass",
61+
"chebi",
62+
predicate=related_match,
63+
method="ner",
64+
custom_filter_function=_custom_filter,
65+
)
66+
67+
68+
if __name__ == "__main__":
69+
main()

src/biomappings/resources/negative.sssom.tsv

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -548,6 +548,56 @@ envo:03501173 post-anesthesia care unit facility skos:exactMatch ncbitaxon:44741
548548
envo:03501219 window skos:exactMatch ncbitaxon:183654 Scophthalmus aquosus semapv:ManualMappingCuration orcid:0000-0003-4423-4370 kestrel-mappings Not
549549
hp:0000001 All skos:exactMatch ncbitaxon:1 root semapv:ManualMappingCuration orcid:0000-0003-4423-4370 kestrel-mappings Not
550550
hp:0000039 Epispadias skos:exactMatch ncbitaxon:2865123 Epispadias semapv:ManualMappingCuration orcid:0000-0003-4423-4370 kestrel-mappings Not
551+
iconclass:11H%28ANTONY%20ABBOT%29411 St. Antony Abbot finds a nugget of gold in the desert skos:relatedMatch chebi:141393 Ser-Thr semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
552+
iconclass:11H%28ANTONY%20ABBOT%29411 St. Antony Abbot finds a nugget of gold in the desert skos:relatedMatch chebi:59965 glyoxal-lysine dimer semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
553+
iconclass:11H%28ELOI%2911 St. Eloi as patron of blacksmiths and goldsmiths skos:relatedMatch chebi:141393 Ser-Thr semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
554+
iconclass:12A43311 ten golden lampstands ~ Jewish Temple skos:relatedMatch chebi:25879 pentaerythritol tetranitrate semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
555+
iconclass:12A43311 ten golden lampstands ~ Jewish Temple skos:relatedMatch chebi:35026 triethylamine semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
556+
iconclass:21A air (one of the four elements) skos:relatedMatch chebi:33250 atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
557+
iconclass:21B earth (one of the four elements) skos:relatedMatch chebi:33250 atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
558+
iconclass:21C fire (one of the four elements) skos:relatedMatch chebi:33250 atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
559+
iconclass:21D water (one of the four elements) skos:relatedMatch chebi:33250 atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
560+
iconclass:22C4%28GOLD%29 colours, pigments, and paints: gold skos:relatedMatch chebi:59965 glyoxal-lysine dimer semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
561+
iconclass:24C14 Mercury (planet) skos:relatedMatch chebi:25195 mercury atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
562+
iconclass:25D13%28GOLD%29 minerals and metals: gold skos:relatedMatch chebi:46662 mineral semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
563+
iconclass:25FF32%28GOLDFINCH%29 song-birds: goldfinch - FF - fabulous animals skos:relatedMatch chebi:72723 Phe-Phe semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
564+
iconclass:25H2313 phosphorescence skos:relatedMatch chmo:0000841 phosphorescence spectrum semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
565+
iconclass:25H2313 phosphorescence skos:relatedMatch chmo:0002299 phosphorescence detection semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
566+
iconclass:25H2313 phosphorescence skos:relatedMatch chmo:0002416 phosphorescence spectroscopy semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
567+
iconclass:31A5453 shower-bath skos:relatedMatch chmo:0010006 emergency shower semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
568+
iconclass:41A2412 shower skos:relatedMatch chmo:0010006 emergency shower semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
569+
iconclass:41B2122 extinguisher for coal skos:relatedMatch chmo:0010008 fire extinguisher semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
570+
iconclass:41B523 fire-hose skos:relatedMatch chmo:0010008 fire extinguisher semapv:ManualMappingCuration orcid:0000-0003-4423-4370 Not
571+
iconclass:41B525 fire-engine skos:relatedMatch chmo:0010008 fire extinguisher semapv:ManualMappingCuration orcid:0000-0003-4423-4370 Not
572+
iconclass:41B53 rescue from fire skos:relatedMatch chmo:0010008 fire extinguisher semapv:ManualMappingCuration orcid:0000-0003-4423-4370 Not
573+
iconclass:41D21 clothes covering the entire body skos:relatedMatch chmo:0010003 protective clothing semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
574+
iconclass:41D24 clothes for special purposes skos:relatedMatch chmo:0010003 protective clothing semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
575+
iconclass:41D25 underclothes and nightwear skos:relatedMatch chmo:0010003 protective clothing semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
576+
iconclass:41D251 underclothes skos:relatedMatch chmo:0010003 protective clothing semapv:ManualMappingCuration orcid:0000-0003-4423-4370 Not
577+
iconclass:41D2511 underclothes for the whole body skos:relatedMatch chmo:0010003 protective clothing semapv:ManualMappingCuration orcid:0000-0003-4423-4370 Not
578+
iconclass:41D26 accessories (~ clothing) skos:relatedMatch chmo:0010003 protective clothing semapv:ManualMappingCuration orcid:0000-0003-4423-4370 Not
579+
iconclass:42DD2543 golden anniversary - DD - out of doors skos:relatedMatch chebi:73446 Asp-Asp semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
580+
iconclass:45C19 protective weapons skos:relatedMatch chmo:0010003 protective clothing semapv:ManualMappingCuration orcid:0000-0003-4423-4370 Not
581+
iconclass:46B332 weighing gold or money skos:relatedMatch chebi:59965 glyoxal-lysine dimer semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
582+
iconclass:46C131617 horse-blanket skos:relatedMatch chmo:0010007 fire blanket semapv:ManualMappingCuration orcid:0000-0003-4423-4370 Not
583+
iconclass:49E3911 alchemist trying to make gold skos:relatedMatch chebi:59965 glyoxal-lysine dimer semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
584+
iconclass:49E81 laboratory apparatus and equipment skos:relatedMatch chmo:0010001 protective equipment semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
585+
iconclass:49E81 laboratory apparatus and equipment skos:relatedMatch chmo:0010004 emergency response equipment semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
586+
iconclass:49E9 experiment, test ~ science and technology skos:relatedMatch chmo:0002746 experimental sample semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
587+
iconclass:51AA61 Synthesis skos:relatedMatch chmo:0001301 synthesis method semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/710f39/scripts/culture_to_chemistry.py Not
588+
iconclass:53B3 Self-control; 'Dominio di se stesso' (Ripa) skos:relatedMatch chebi:27568 selenium atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
589+
iconclass:56F24 Self-love; 'Amor di se stesso' (Ripa) skos:relatedMatch chebi:27568 selenium atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
590+
iconclass:71E3265 battle of the Israelites against King Og skos:relatedMatch chebi:194541 oganesson atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
591+
iconclass:71I423 Hiram sends gold to Solomon skos:relatedMatch chebi:59965 glyoxal-lysine dimer semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
592+
iconclass:83%28OVID%2C%20Metamorphoses%20I%3A668-688%29 Jupiter sends Mercury to kill Argus skos:relatedMatch chebi:25195 mercury atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
593+
iconclass:83%28OVID%2C%20Metamorphoses%20I%3A689-721%29 Mercury tells the story of Syrinx skos:relatedMatch chebi:25195 mercury atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
594+
iconclass:83%28OVID%2C%20Metamorphoses%20II%3A737-751%29 Mercury elicits the help of Aglauros skos:relatedMatch chebi:25195 mercury atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
595+
iconclass:83%28OVID%2C%20Metamorphoses%20XI%3A85-145%29 Midas and the golden touch skos:relatedMatch chebi:145865 bencarbazone semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
596+
iconclass:83%28OVID%2C%20Metamorphoses%20XV%3A237-258%29 Pythagoras’s Teachings: The Elements skos:relatedMatch chebi:33250 atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
597+
iconclass:94C112 Mercury brings Paris the golden apple skos:relatedMatch chebi:16170 mercury(0) semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
598+
iconclass:94C112 Mercury brings Paris the golden apple skos:relatedMatch chebi:25195 mercury atom semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
599+
iconclass:94E231 gold is found in Palamedes' tent skos:relatedMatch chebi:59965 glyoxal-lysine dimer semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
600+
iconclass:94G42 Hector's body is weighed in gold (Aeschylus) skos:relatedMatch chebi:59965 glyoxal-lysine dimer semapv:ManualMappingCuration orcid:0000-0003-4423-4370 https://github.com/biomappings/biomappings/blob/0cd002/scripts/culture_to_chemistry.py Not
551601
ido:0000636 sepsis skos:exactMatch ncbitaxon:137507 Sepsis semapv:ManualMappingCuration orcid:0000-0003-4423-4370 kestrel-mappings Not
552602
idomal:0000222 enzyme-linked immunosorbent assay skos:exactMatch maxo:0000610 clinical ELISA testing semapv:ManualMappingCuration orcid:0000-0003-4423-4370 mira Not
553603
idomal:0001019 mouse skos:exactMatch ncbitaxon:10090 Mus musculus semapv:ManualMappingCuration orcid:0000-0003-4423-4370 mira Not

0 commit comments

Comments
 (0)