Skip to content

Commit e983c1a

Browse files
committed
Better text
1 parent b75caba commit e983c1a

11 files changed

Lines changed: 198 additions & 99 deletions

File tree

README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -71,29 +71,29 @@ mapping = Mapping(
7171

7272
### Assembly
7373

74-
Mappings can be assembled from many source formats using functions in the
75-
`semra.io` submodule:
74+
Mappings can be assembled from many source formats using I/O functions exposed
75+
through the top-level `semra` submodule:
7676

7777
```python
78-
import semra.io
78+
import semra
7979

8080
# load mappings from any standardized SSSOM file as a file path or URL, via `pandas.read_csv`
8181
sssom_url = "https://w3id.org/biopragmatics/biomappings/sssom/biomappings.sssom.tsv"
82-
mappings = semra.io.from_sssom(
82+
mappings = semra.from_sssom(
8383
sssom_url, license="spdx:CC0-1.0", mapping_set_title="biomappings",
8484
)
8585

8686
# alternatively, metadata can be passed via a file/URL
87-
mappings_alt = semra.io.from_sssom(
87+
mappings_alt = semra.from_sssom(
8888
sssom_url,
8989
metadata="https://w3id.org/biopragmatics/biomappings/sssom/biomappings.sssom.yml"
9090
)
9191

9292
# load mappings from the Gene Ontology (via OBO format)
93-
go_mappings = semra.io.from_pyobo("go")
93+
go_mappings = semra.from_pyobo("go")
9494

9595
# load mappings from the Uber Anatomy Ontology (via OWL format)
96-
uberon_mappings = semra.io.from_bioontologies("uberon")
96+
uberon_mappings = semra.from_bioontologies("uberon")
9797
```
9898

9999
SeMRA also implements custom importers in the `semra.sources` submodule. It's
@@ -281,7 +281,7 @@ these references can be standardized in a deterministic and principled way.
281281

282282
```python
283283
import chembl_downloader
284-
import semra.io
284+
import semra
285285
from semra.api import prioritize_df
286286

287287
# A dataframe of indication-disease pairs, where the
@@ -291,7 +291,7 @@ df = chembl_downloader.query("SELECT DISTINCT drugind_id, efo_id FROM DRUG_INDIC
291291
# a pre-calculated prioritization of diseases and phenotypes from MONDO, DOID,
292292
# HPO, ICD, GARD, and more.
293293
url = "https://zenodo.org/records/15164180/files/priority.sssom.tsv?download=1"
294-
mappings = semra.io.from_sssom(url)
294+
mappings = semra.from_sssom(url)
295295

296296
# the dataframe will now have a new column with standardized references
297297
prioritize_df(mappings, df, column="efo_id", target_column="priority_indication_curie")

docs/source/index.rst

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,13 @@ the digital humanities. Get started by loading external mappings:
3131

3232
.. code-block:: python
3333
34-
import semra.io
34+
import semra
3535
36-
# load mappings from any standardized SSSOM file as a file path or URL, via `pandas.read_csv`
37-
sssom_url = "https://w3id.org/biopragmatics/biomappings/sssom/biomappings.sssom.tsv"
38-
mappings = semra.io.from_sssom(
39-
sssom_url, license="spdx:CC0-1.0", mapping_set_title="biomappings",
36+
mappings = semra.from_sssom(
37+
# load mappings from any standardized SSSOM file as a file path or URL
38+
"https://w3id.org/biopragmatics/biomappings/sssom/biomappings.sssom.tsv",
39+
license="spdx:CC0-1.0",
40+
mapping_set_title="biomappings",
4041
)
4142
4243
Or by creating your own mappings:
@@ -97,9 +98,9 @@ web application for your use-case specific mapping database.
9798
SeMRA isn't itself a curation tool, but it has the option to integrate :mod:`biomappings`
9899
in deployments of its local web application for curation purposes.
99100

100-
SeMRA isn't an tool for merging ontologies like `CoMerger <https://arxiv.org/abs/2005.02659>`_,
101-
but it outputs detailed and comprehensive semantic mappings that are critical
102-
as input for such tools.
101+
SeMRA isn't an tool for merging ontologies like `CoMerger <https://arxiv.org/abs/2005.02659>`_
102+
or `OntoMerger <https://arxiv.org/abs/2206.02238>`_, but it outputs detailed
103+
and comprehensive semantic mappings that are critical as input for such tools.
103104

104105
Artifacts Overview
105106
------------------
@@ -149,11 +150,11 @@ Table of Contents
149150
:name: start
150151

151152
installation
152-
tutorial
153-
io
154153
pipeline
155154
artifacts
155+
tutorial
156156
struct
157+
io
157158
reference
158159
cli
159160

docs/source/pipeline.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ Mapping Assembly Pipeline
33

44
.. automodapi:: semra.pipeline
55
:no-heading:
6+
:no-inheritance-diagram:

docs/source/reference.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
Reference
22
=========
33

4+
This contains several SeMRA submodules with low-level functionality. You can use these
5+
to build your own mapping processing workflows and I/O.
6+
47
.. automodapi:: semra.api
8+
:no-inheritance-diagram:
59

610
.. automodapi:: semra.inference
711

docs/source/tutorial.rst

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
1-
Usage
2-
=====
3-
4-
1. I/O
5-
2. How to make a configuration and run it
6-
3. How to apply results
7-
8-
Data Science Tutorial
9-
---------------------
1+
Prioritizing CURIEs in a Dataframe
2+
==================================
103

114
SeMRA provides tools for data scientists to standardize references using semantic
125
mappings.
Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,18 @@
55
import pystow
66
from gilda import Grounder
77
from gilda.grounder import load_entries_from_terms_file
8-
from gilda.resources import get_grounding_terms, resource_dir
8+
from gilda.resources import get_grounding_terms
99

1010
from semra.gilda_utils import (
1111
GILDA_TO_BIOREGISTRY,
1212
print_scored_matches,
13-
standardize_terms,
14-
update_terms,
13+
standardize_gilda_terms,
14+
update_gilda_terms,
1515
)
16-
from semra.pipeline import Configuration, Input, Mutation, get_priority_mappings_from_config
16+
from semra.pipeline import AssembleReturnType, Configuration, Input, Mutation, assemble
1717

1818
MODULE = pystow.module("semra", "gilda-demo")
19-
PROCESSED_GILDA_TERMS_PATH = resource_dir.joinpath("grounding_terms_standardized.tsv.gz")
19+
PROCESSED_GILDA_TERMS_PATH = MODULE.join(name="grounding_terms_standardized.tsv.gz")
2020

2121
PRIORITY = [
2222
"HP",
@@ -42,6 +42,8 @@
4242
PRIORITY = [GILDA_TO_BIOREGISTRY[p] for p in PRIORITY]
4343

4444
CONFIGURATION = Configuration(
45+
key="gilda",
46+
name="Gilda Reprocessing",
4547
inputs=[
4648
Input(source="biomappings"),
4749
Input(source="gilda"),
@@ -72,14 +74,14 @@ def _get_terms() -> list[gilda.Term]:
7274
from gilda.generate_terms import dump_terms
7375

7476
terms: list[gilda.Term] = list(load_entries_from_terms_file(get_grounding_terms()))
75-
terms = standardize_terms(terms)
77+
terms = standardize_gilda_terms(terms)
7678
dump_terms(terms, PROCESSED_GILDA_TERMS_PATH)
7779
return terms
7880

7981

80-
def main():
82+
def main() -> None:
8183
"""Reprocess the gilda default lexical index."""
82-
mappings = get_priority_mappings_from_config(CONFIGURATION)
84+
mappings = assemble(CONFIGURATION, return_type=AssembleReturnType.priority)
8385
if not mappings:
8486
raise ValueError("Bad mapping priority definition resulted in no mappings")
8587

@@ -91,7 +93,7 @@ def main():
9193
if missing:
9294
raise ValueError(f"Missing: {sorted(missing)}")
9395

94-
terms = update_terms(terms, mappings)
96+
terms = update_gilda_terms(terms, mappings)
9597

9698
grounder = Grounder(terms)
9799
s = "Pelvic lipomatosis"

src/semra/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
"""Semantic Mapping Reasoner and Assembler."""
22

3+
from semra.io import from_bioontologies, from_jsonl, from_pyobo, from_sssom
34
from semra.pipeline import Configuration, Input, Mutation
45
from semra.struct import Evidence, Mapping, MappingSet, ReasonedEvidence, Reference, SimpleEvidence
56
from semra.vocabulary import (
@@ -33,4 +34,8 @@
3334
"ReasonedEvidence",
3435
"Reference",
3536
"SimpleEvidence",
37+
"from_bioontologies",
38+
"from_jsonl",
39+
"from_pyobo",
40+
"from_sssom",
3641
]

src/semra/gilda_utils.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
from semra.struct import Mapping
2020

2121
__all__ = [
22-
"update_terms",
22+
"update_gilda_terms",
2323
]
2424

2525
logger = logging.getLogger(__name__)
@@ -49,7 +49,7 @@
4949
REVERSE_GILDA_MAP = {v: k for k, v in GILDA_TO_BIOREGISTRY.items()}
5050

5151

52-
def update_terms(terms: list[gilda.Term], mappings: list[Mapping]) -> list[gilda.Term]:
52+
def update_gilda_terms(terms: list[gilda.Term], mappings: list[Mapping]) -> list[gilda.Term]:
5353
"""Use a priority mapping to re-write terms with priority groundings.
5454
5555
:param terms: A list of Gilda term objects
@@ -93,7 +93,7 @@ def update_terms(terms: list[gilda.Term], mappings: list[Mapping]) -> list[gilda
9393
source_terms = terms_index.pop(mapping.subject.pair, None)
9494
if source_terms:
9595
terms_index[mapping.object.pair].extend(
96-
make_new_term(term, mapping.object.prefix, mapping.object.identifier)
96+
make_new_gilda_term(term, mapping.object.prefix, mapping.object.identifier)
9797
for term in source_terms
9898
)
9999

@@ -102,16 +102,16 @@ def update_terms(terms: list[gilda.Term], mappings: list[Mapping]) -> list[gilda
102102
return cast(list[gilda.Term], gilda.term.filter_out_duplicates(new_terms))
103103

104104

105-
def standardize_terms(
105+
def standardize_gilda_terms(
106106
terms: t.Iterable[gilda.Term], *, multiprocessing: bool = True
107107
) -> list[gilda.Term]:
108108
"""Standardize a list of terms."""
109109
if not multiprocessing:
110-
return [standardize_term(t) for t in terms]
110+
return [standardize_gilda_term(t) for t in terms]
111111
return cast(
112112
list[gilda.Term],
113113
process_map(
114-
standardize_term,
114+
standardize_gilda_term,
115115
terms,
116116
unit="term",
117117
unit_scale=True,
@@ -121,7 +121,7 @@ def standardize_terms(
121121
)
122122

123123

124-
def standardize_term(term: gilda.Term) -> gilda.Term:
124+
def standardize_gilda_term(term: gilda.Term) -> gilda.Term:
125125
"""Standardize a term's prefix and identifier to the Bioregistry standard."""
126126
prefix = bioregistry.normalize_prefix(term.db)
127127
if prefix is None:
@@ -137,7 +137,7 @@ def standardize_term(term: gilda.Term) -> gilda.Term:
137137
return term
138138

139139

140-
def make_new_term(
140+
def make_new_gilda_term(
141141
term: gilda.Term,
142142
target_db: str,
143143
target_id: str,

0 commit comments

Comments
 (0)