Skip to content

Commit 0f5a35f

Browse files
committed
More docs
1 parent ce88a8b commit 0f5a35f

12 files changed

Lines changed: 433 additions & 32 deletions

File tree

docs/source/conf.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,13 @@
246246
"sklearn": ("https://scikit-learn.org/stable", None),
247247
"numpy": ("https://numpy.org/doc/stable", None),
248248
"scipy": ("https://docs.scipy.org/doc/scipy", None),
249+
"sssom": ("https://mapping-commons.github.io/sssom-py/", None),
250+
"bioregistry": ("https://bioregistry.readthedocs.io/en/stable/", None),
251+
"biomappings": ("https://biomappings.readthedocs.io/en/stable/", None),
252+
"curies": ("https://curies.readthedocs.io/en/stable/", None),
253+
"flask": ("https://flask.palletsprojects.com/", None),
254+
"pydantic": ("https://docs.pydantic.dev/latest/", None),
255+
"rdflib": ("https://rdflib.readthedocs.io/en/stable/", None),
249256
}
250257

251258
autoclass_content = "both"

docs/source/index.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,15 @@ by a variety of people:
2626
want to make their toolchain more generic for loading, assembling, processing, and
2727
outputting semantic mappings.
2828

29+
SeMRA is generally applicable in **any domain**, from biomedicine to particle physics to
30+
the digital humanities.
31+
2932
Features
3033
--------
3134

32-
1. An object model for semantic mappings (based on SSSOM)
35+
1. An object model for semantic mappings (based on the `Simple Standard for Sharing
36+
Ontological Mappings (SSSOM) <https://mapping-commons.github.io/sssom/>`_ and
37+
:mod:`sssom`)
3338
2. Functionality for assembling and reasoning over semantic mappings at scale
3439
3. A provenance model for automatically generated mappings
3540
4. A confidence model granular at the curator-level, mapping set-level, and community

src/semra/database.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,34 @@
44
55
.. code-block:: console
66
7+
$ uv pip install semra
78
$ semra build
9+
10+
The ``semra build`` command downloads and process all resource and constructs
11+
a database of unprocessed mappings.
12+
13+
.. note::
14+
15+
Downloading raw data resources can take on the order of hours to tens
16+
of hours depending on your internet connection and the reliability of
17+
the resources' respective servers.
18+
19+
Processing and analysis can be run overnight on commodity hardware
20+
(e.g., a 2023 MacBook Pro with 36GB RAM).
21+
22+
The SeMRA Raw Mappings Database can be downloaded from Zenodo at |raw|.
23+
After downloading all files and unzipping then, a web application wrapping
24+
the SeMRA Raw Mappings Database run locally on Docker with:
25+
26+
.. code-block:: console
27+
28+
$ sh run_on_docker.sh
29+
30+
Navigate to http://localhost:8773 to see the web application.
31+
32+
.. |raw| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.11082038.svg
33+
:target: https://doi.org/10.5281/zenodo.11082038
34+
835
"""
936

1037
import subprocess

src/semra/landscape/__init__.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,7 @@
44
55
.. code-block:: console
66
7-
$ git clone https://github.com/biopragmatics/semra.git
8-
$ cd semra
9-
$ uv pip install .[landscape]
7+
$ uv pip install semra[landscape]
108
$ semra landscape
119
1210
The ``semra landscape`` command runs all pre-configured domain-specific mapping
@@ -56,11 +54,23 @@
5654
from .disease import DISEASE_CONFIGURATION
5755
from .gene import GENE_CONFIGURATION
5856
from .taxrank import TAXRANK_CONFIGURATION
57+
from ..pipeline import Configuration
58+
59+
#: A list of domain-specific configurations
60+
CONFIGURATIONS: list[Configuration] = [
61+
ANATOMY_CONFIGURATION,
62+
CELL_CONFIGURATION,
63+
COMPLEX_CONFIGURATION,
64+
DISEASE_CONFIGURATION,
65+
GENE_CONFIGURATION,
66+
TAXRANK_CONFIGURATION,
67+
]
5968

6069
__all__ = [
6170
"ANATOMY_CONFIGURATION",
6271
"CELL_CONFIGURATION",
6372
"COMPLEX_CONFIGURATION",
73+
"CONFIGURATIONS",
6474
"DISEASE_CONFIGURATION",
6575
"GENE_CONFIGURATION",
6676
"TAXRANK_CONFIGURATION",

src/semra/landscape/anatomy.py

Lines changed: 37 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,20 @@
1-
"""A configuration for assembling mappings for anatomical terms.
1+
"""
2+
The SeMRA Anatomy Mappings Database assembles semantic mappings to the following
3+
resources:
4+
5+
========================================= =========================================================
6+
Prefix Name
7+
========================================= =========================================================
8+
`uberon <https://bioregistry.io/uberon>`_ Uber Anatomy Ontology
9+
`mesh <https://bioregistry.io/mesh>`_ Medical Subject Headings
10+
`bto <https://bioregistry.io/bto>`_ BRENDA Tissue Ontology
11+
`caro <https://bioregistry.io/caro>`_ Common Anatomy Reference Ontology
12+
`ncit <https://bioregistry.io/ncit>`_ NCI Thesaurus
13+
`umls <https://bioregistry.io/umls>`_ Unified Medical Language System Concept Unique Identifier
14+
========================================= =========================================================
15+
16+
Reproduction
17+
************
218
319
The SeMRA Anatomy Mappings Database can be rebuilt with the following commands:
420
@@ -9,16 +25,31 @@
925
$ uv pip install .[landscape]
1026
$ python -m semra.landscape.anatomy
1127
12-
The artifacts can be downloaded from `Zenodo
13-
<https://doi.org/10.5281/zenodo.11091802>`_. After running Docker locally, downloading
14-
all files, and unzipping then, the SeMRA web application can be run with:
28+
Web Application
29+
***************
30+
The pre-built artifacts for this mapping database can be downloaded from Zenodo
31+
at |anatomy| and unzipped. The web application can be run
32+
locally on Docker from inside the folder where the data was unzipped with:
1533
1634
.. code-block:: console
1735
1836
$ sh run_on_docker.sh
1937
20-
Navigate to http://localhost:8773 to see the web application.
21-
"""
38+
If you reproduced the database yourself, you can ``cd``
39+
to the right folder and run with:
40+
41+
.. code-block:: console
42+
43+
$ cd ~/.data/semra/case-studies/anatomy
44+
$ sh run_on_docker.sh
45+
46+
Finally, navigate in your web browser to http://localhost:8773 to see the web
47+
application.
48+
49+
.. |anatomy| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.11091803.svg
50+
:target: https://doi.org/10.5281/zenodo.11091803
51+
52+
""" # noqa:D205,D400
2253

2354
import pystow
2455

@@ -28,7 +59,6 @@
2859

2960
__all__ = [
3061
"ANATOMY_CONFIGURATION",
31-
"MODULE",
3262
]
3363

3464
MODULE = pystow.module("semra", "case-studies", "anatomy")

src/semra/landscape/cell.py

Lines changed: 55 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,59 @@
1-
"""A configuration for assembling mappings for cell and cell line terms.
2-
3-
This configuration can be used to reproduce the results from the Biomappings paper by
4-
doing the following:
5-
6-
1. Load positive mappings - PyOBO: EFO, DepMap, CCLE - Custom: Cellosaurus - Biomappings
7-
2. Upgrade mappings from dbxrefs to skos:exactMatch
8-
3. Use transitive closure to infer new mappings
9-
4. Load negative mappings from Biomappings
10-
5. Filter out negative mappings
11-
6. Subset a CCLE->EFO consolidation set
12-
7. Output SSSOM
131
"""
2+
The SeMRA Cell and Cell Line Mappings Database assembles semantic mappings to the following
3+
resources:
4+
5+
=================================================== =========================================================
6+
Prefix Name
7+
=================================================== =========================================================
8+
`mesh <https://bioregistry.io/mesh>`_ Medical Subject Headings
9+
`efo <https://bioregistry.io/efo>`_ Experimental Factor Ontology
10+
`cellosaurus <https://bioregistry.io/cellosaurus>`_ Cellosaurus
11+
`ccle <https://bioregistry.io/ccle>`_ Cancer Cell Line Encyclopedia Cells
12+
`depmap <https://bioregistry.io/depmap>`_ DepMap Cell Lines
13+
`bto <https://bioregistry.io/bto>`_ BRENDA Tissue Ontology
14+
`cl <https://bioregistry.io/cl>`_ Cell Ontology
15+
`clo <https://bioregistry.io/clo>`_ Cell Line Ontology
16+
`ncit <https://bioregistry.io/ncit>`_ NCI Thesaurus
17+
`umls <https://bioregistry.io/umls>`_ Unified Medical Language System Concept Unique Identifier
18+
=================================================== =========================================================
19+
20+
Reproduction
21+
************
22+
23+
The SeMRA Cell and Cell Line Mappings Database can be rebuilt with the following commands:
24+
25+
.. code-block:: console
26+
27+
$ git clone https://github.com/biopragmatics/semra.git
28+
$ cd semra
29+
$ uv pip install .[landscape]
30+
$ python -m semra.landscape.cell
31+
32+
Web Application
33+
***************
34+
The pre-built artifacts for this mapping database can be downloaded from Zenodo
35+
at |cell| and unzipped. The web application can be run
36+
locally on Docker from inside the folder where the data was unzipped with:
37+
38+
.. code-block:: console
39+
40+
$ sh run_on_docker.sh
41+
42+
If you reproduced the database yourself, you can ``cd``
43+
to the right folder and run with:
44+
45+
.. code-block:: console
46+
47+
$ cd ~/.data/semra/case-studies/cell
48+
$ sh run_on_docker.sh
49+
50+
Finally, navigate in your web browser to http://localhost:8773 to see the web
51+
application.
52+
53+
.. |cell| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.11091581.svg
54+
:target: https://doi.org/10.5281/zenodo.11091581
55+
56+
""" # noqa:D205,D400
1457

1558
import click
1659
import pystow
@@ -23,7 +66,6 @@
2366

2467
__all__ = [
2568
"CELL_CONFIGURATION",
26-
"MODULE",
2769
]
2870

2971
MODULE = pystow.module("semra", "case-studies", "cells")

src/semra/landscape/complex.py

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,57 @@
1-
"""A configuration for assembling mappings for protein complex terms."""
1+
"""
2+
The SeMRA Protein Complex Mappings Database assembles semantic mappings to the following
3+
resources:
4+
5+
======================================================= ===================================
6+
Prefix Name
7+
======================================================= ===================================
8+
`complexportal <https://bioregistry.io/complexportal>`_ Complex Portal
9+
`fplx <https://bioregistry.io/fplx>`_ FamPlex
10+
`go <https://bioregistry.io/go>`_ Gene Ontology
11+
`chembl.target <https://bioregistry.io/chembl.target>`_ ChEMBL target
12+
`wikidata <https://bioregistry.io/wikidata>`_ Wikidata
13+
`scomp <https://bioregistry.io/scomp>`_ Selventa Complexes
14+
`signor <https://bioregistry.io/signor>`_ Signaling Network Open Resource
15+
`intact <https://bioregistry.io/intact>`_ IntAct protein interaction database
16+
======================================================= ===================================
17+
18+
Reproduction
19+
************
20+
21+
The SeMRA Protein Complex Mappings Database can be rebuilt with the following commands:
22+
23+
.. code-block:: console
24+
25+
$ git clone https://github.com/biopragmatics/semra.git
26+
$ cd semra
27+
$ uv pip install .[landscape]
28+
$ python -m semra.landscape.complex
29+
30+
Web Application
31+
***************
32+
The pre-built artifacts for this mapping database can be downloaded from Zenodo
33+
at |complex| and unzipped. The web application can be run
34+
locally on Docker from inside the folder where the data was unzipped with:
35+
36+
.. code-block:: console
37+
38+
$ sh run_on_docker.sh
39+
40+
If you reproduced the database yourself, you can ``cd``
41+
to the right folder and run with:
42+
43+
.. code-block:: console
44+
45+
$ cd ~/.data/semra/case-studies/complex
46+
$ sh run_on_docker.sh
47+
48+
Finally, navigate in your web browser to http://localhost:8773 to see the web
49+
application.
50+
51+
.. |complex| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.11091422.svg
52+
:target: https://doi.org/10.5281/zenodo.11091422
53+
54+
""" # noqa:D205,D400
255

356
import pystow
457

@@ -8,7 +61,6 @@
861

962
__all__ = [
1063
"COMPLEX_CONFIGURATION",
11-
"MODULE",
1264
]
1365

1466
MODULE = pystow.module("semra", "case-studies", "complex")

src/semra/landscape/disease.py

Lines changed: 66 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,69 @@
1-
"""A configuration for assembling mappings for disease terms."""
1+
"""
2+
The SeMRA Disease Mappings Database assembles semantic mappings to the following
3+
resources:
4+
5+
======================================================= ================================================================================
6+
Prefix Name
7+
======================================================= ================================================================================
8+
`doid <https://bioregistry.io/doid>`_ Human Disease Ontology
9+
`mondo <https://bioregistry.io/mondo>`_ Mondo Disease Ontology
10+
`efo <https://bioregistry.io/efo>`_ Experimental Factor Ontology
11+
`mesh <https://bioregistry.io/mesh>`_ Medical Subject Headings
12+
`ncit <https://bioregistry.io/ncit>`_ NCI Thesaurus
13+
`orphanet <https://bioregistry.io/orphanet>`_ Orphanet
14+
`orphanet.ordo <https://bioregistry.io/orphanet.ordo>`_ Orphanet Rare Disease Ontology
15+
`umls <https://bioregistry.io/umls>`_ Unified Medical Language System Concept Unique Identifier
16+
`omim <https://bioregistry.io/omim>`_ Online Mendelian Inheritance in Man
17+
`omim.ps <https://bioregistry.io/omim.ps>`_ OMIM Phenotypic Series
18+
`gard <https://bioregistry.io/gard>`_ Genetic and Rare Diseases Information Center
19+
`icd10 <https://bioregistry.io/icd10>`_ International Classification of Diseases, 10th Revision
20+
`icd10cm <https://bioregistry.io/icd10cm>`_ International Classification of Diseases, 10th Revision, Clinical Modification
21+
`icd10pcs <https://bioregistry.io/icd10pcs>`_ International Classification of Diseases, 10th Revision, Procedure Coding System
22+
`icd11 <https://bioregistry.io/icd11>`_ International Classification of Diseases, 11th Revision (Foundation Component)
23+
`icd11.code <https://bioregistry.io/icd11.code>`_ ICD 11 Codes
24+
`icd9 <https://bioregistry.io/icd9>`_ International Classification of Diseases, 9th Revision
25+
`icd9cm <https://bioregistry.io/icd9cm>`_ International Classification of Diseases, 9th Revision, Clinical Modification
26+
`icdo <https://bioregistry.io/icdo>`_ International Classification of Diseases for Oncology
27+
======================================================= ================================================================================
28+
29+
Reproduction
30+
************
31+
32+
The SeMRA Disease Mappings Database can be rebuilt with the following commands:
33+
34+
.. code-block:: console
35+
36+
$ git clone https://github.com/biopragmatics/semra.git
37+
$ cd semra
38+
$ uv pip install .[landscape]
39+
$ python -m semra.landscape.disease
40+
41+
Web Application
42+
***************
43+
The pre-built artifacts for this mapping database can be downloaded from Zenodo
44+
at |disease| and unzipped. The web application can be run
45+
locally on Docker from inside the folder where the data was unzipped with:
46+
47+
.. code-block:: console
48+
49+
$ sh run_on_docker.sh
50+
51+
If you reproduced the database yourself, you can ``cd``
52+
to the right folder and run with:
53+
54+
.. code-block:: console
55+
56+
$ cd ~/.data/semra/case-studies/disease
57+
$ sh run_on_docker.sh
58+
59+
Finally, navigate in your web browser to http://localhost:8773 to see the web
60+
application.
61+
62+
.. |disease| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.11091886.svg
63+
:target: https://doi.org/10.5281/zenodo.11091886
64+
65+
""" # noqa:D205,D400
66+
267

368
import bioregistry
469
import pystow
@@ -9,7 +74,6 @@
974

1075
__all__ = [
1176
"DISEASE_CONFIGURATION",
12-
"MODULE",
1377
]
1478

1579
ICD_PREFIXES = bioregistry.get_collection("0000004").resources # type:ignore

0 commit comments

Comments
 (0)