Skip to content

Commit 5e67cf8

Browse files
deeenesclaude
andcommitted
Make identifiers optional in translation/orthology functions; fix organisms_df
Rework translation_dict, translation_df, orthology_dict, and orthology_df to download the full table by default (identifiers=None), using the new /mapping/table and /orthology/table endpoints. Fix organisms_df polars schema inference with infer_schema_length=None. Improve docs landing page and utils vignette. Update tests for new signatures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 9c35595 commit 5e67cf8

6 files changed

Lines changed: 251 additions & 98 deletions

File tree

docs/index.md

Lines changed: 57 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,83 @@
11
# omnipath-client
22

3-
Python client for the OmniPath web services.
3+
**The easiest way to access molecular biology knowledge in Python.**
44

5-
## Services
5+
omnipath-client connects you to the [OmniPath](https://omnipathdb.org)
6+
ecosystem -- a comprehensive collection of molecular biology databases
7+
covering signaling, gene regulation, protein interactions, metabolomics,
8+
and more.
69

7-
### OmniPath Database
8-
Query protein interactions, annotations, complexes, and more from the
9-
[OmniPath database](https://omnipathdb.org).
10+
## Why OmniPath?
1011

11-
### OmniPath Utils
12-
ID translation, taxonomy, orthology, and reference lists via the
13-
[utils service](https://utils.omnipathdb.org).
12+
OmniPath integrates data from **200+ databases** into a unified resource:
13+
protein-protein interactions, enzyme-substrate relationships, transcription
14+
factor targets, protein complexes, functional annotations, intercellular
15+
communication, and metabolite networks. Instead of querying dozens of
16+
databases individually, query OmniPath once.
1417

15-
## Quick Start
18+
## Why this client?
1619

17-
```bash
18-
pip install omnipath-client
19-
```
20+
- **No setup needed** -- queries the web service, no local database required
21+
- **97 identifier types** -- translate between UniProt, gene symbols, Ensembl,
22+
ChEBI, HMDB, and 90+ other ID systems
23+
- **28,000 organisms** -- taxonomy resolution across NCBI, Ensembl, KEGG, OMA
24+
- **Orthology** -- cross-species gene translation with 6 backends
25+
- **DataFrames** -- returns polars, pandas, or pyarrow DataFrames
26+
- **BSD-3-Clause** -- free to use in any project
27+
28+
## Quick examples
29+
30+
### Translate gene symbols to UniProt
2031

21-
### ID Translation
2232
```python
23-
from omnipath_client.utils import map_name, translate_column
33+
from omnipath_client.utils import map_name, translation_df
2434

2535
map_name('TP53', 'genesymbol', 'uniprot')
2636
# {'P04637'}
2737

28-
# Translate DataFrame column (pandas, polars, or pyarrow)
29-
translate_column(df, 'gene', 'genesymbol', 'uniprot')
38+
# Full translation table as DataFrame
39+
df = translation_df('genesymbol', 'uniprot')
3040
```
3141

32-
### Taxonomy
33-
```python
34-
from omnipath_client.utils import ensure_ncbi_tax_id
35-
ensure_ncbi_tax_id('human') # 9606
36-
```
42+
### Cross-species translation
3743

38-
### Orthology
3944
```python
4045
from omnipath_client.utils import orthology_translate
41-
orthology_translate(['TP53'], source=9606, target=10090)
42-
# {'TP53': {'Trp53'}}
46+
47+
orthology_translate(['TP53', 'EGFR'], source=9606, target=10090)
48+
# {'TP53': {'Trp53'}, 'EGFR': {'Egfr'}}
4349
```
4450

45-
### OmniPath Database
51+
### Query protein interactions
52+
4653
```python
4754
import omnipath_client as op
55+
4856
df = op.interactions(entity_ids=['Q9Y6K9'])
4957
```
5058

51-
## Getting started
59+
### Explore the API
60+
61+
```python
62+
import omnipath_client as op
63+
64+
op.endpoints() # all endpoints
65+
op.params('exports/interactions') # available filters
66+
op.values('exports/interactions', 'entity_types') # allowed values
67+
```
68+
69+
## Learn more
70+
71+
- **[OmniPath Utils vignette](vignettes/utils.md)** -- ID translation,
72+
taxonomy, orthology, reference lists
73+
- **[OmniPath Database vignette](vignettes/database.md)** -- interactions,
74+
annotations, complexes
75+
- **[API Reference](reference/index.md)** -- full function documentation
76+
- **[Installation](installation.md)** -- setup instructions
77+
78+
## Services
5279

53-
- [Installation](installation.md) -- how to install the package
54-
- [Quickstart](quickstart.md) -- basic usage examples
55-
- [API Reference](reference/index.md) -- full API documentation
80+
| Service | URL | What it provides |
81+
|---------|-----|-----------------|
82+
| OmniPath Database | [dev.omnipathdb.org](https://dev.omnipathdb.org) | Interactions, annotations, complexes, ontology |
83+
| OmniPath Utils | [utils.omnipathdb.org](https://utils.omnipathdb.org) | ID translation, taxonomy, orthology, reference lists |

docs/vignettes/utils.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,15 @@ from omnipath_client.utils import (
1818
map_names,
1919
translate,
2020
translate_column,
21+
translation_dict,
22+
translation_df,
2123
id_types,
2224
ensure_ncbi_tax_id,
2325
all_organisms,
26+
organisms_df,
2427
orthology_translate,
28+
orthology_dict,
29+
orthology_df,
2530
get_reflist,
2631
is_swissprot,
2732
)
@@ -57,6 +62,24 @@ translate(['TP53', 'EGFR', 'BRCA1'], 'genesymbol', 'uniprot')
5762
# {'TP53': {'P04637'}, 'EGFR': {'P00533'}, 'BRCA1': {'P38398'}}
5863
```
5964

65+
### Full translation tables
66+
67+
Download a complete ID mapping table as a dict or DataFrame:
68+
69+
```python
70+
from omnipath_client.utils import translation_dict, translation_df
71+
72+
# Full genesymbol -> uniprot table as dict
73+
table = translation_dict('genesymbol', 'uniprot')
74+
table['TP53'] # {'P04637'}
75+
76+
# Or as a DataFrame
77+
df = translation_df('genesymbol', 'uniprot')
78+
79+
# Translate only specific IDs
80+
table = translation_dict('genesymbol', 'uniprot', identifiers=['TP53', 'EGFR'])
81+
```
82+
6083
### Translate a DataFrame column
6184

6285
Works with pandas, polars, and pyarrow DataFrames:
@@ -133,6 +156,15 @@ organisms = all_organisms()
133156
# [{'ncbi_tax_id': 9606, 'common_name': 'human', ...}, ...]
134157
```
135158

159+
### Organisms as DataFrame
160+
161+
```python
162+
from omnipath_client.utils import organisms_df
163+
164+
df = organisms_df() # all organisms
165+
df = organisms_df(has_data=True) # only organisms with mapping data
166+
```
167+
136168
## Orthology
137169

138170
### Cross-species gene translation
@@ -149,6 +181,25 @@ orthology_translate(['TP53'], source=9606, target=10090, min_sources=5)
149181
orthology_translate(['TP53'], source=9606, target=10090, resource='oma')
150182
```
151183

184+
### Full orthology tables
185+
186+
Download a complete orthology table:
187+
188+
```python
189+
from omnipath_client.utils import orthology_dict, orthology_df
190+
191+
# Full human-to-mouse orthology as dict
192+
table = orthology_dict(source=9606, target=10090)
193+
194+
# Or as a DataFrame
195+
df = orthology_df(source=9606, target=10090)
196+
197+
# Translate only specific IDs
198+
table = orthology_dict(
199+
source=9606, target=10090, identifiers=['TP53', 'EGFR'],
200+
)
201+
```
202+
152203
### Translate DataFrame column to orthologs
153204

154205
```python

omnipath_client/utils/_mapping.py

Lines changed: 47 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -284,20 +284,23 @@ def all_mappings(
284284

285285

286286
def translation_dict(
287-
identifiers: str | list[str],
288287
id_type: str,
289288
target_id_type: str,
290289
ncbi_tax_id: int = 9606,
290+
identifiers: str | list[str] | None = None,
291291
raw: bool = False,
292292
backend: str | None = None,
293293
) -> dict[str, set[str]]:
294294
"""Get translation data as a dict.
295295
296+
Downloads the full translation table by default. If identifiers are
297+
given, translates only those.
298+
296299
Args:
297-
identifiers: Source IDs to translate. String or list.
298300
id_type: Source ID type.
299301
target_id_type: Target ID type.
300302
ncbi_tax_id: Organism (default: 9606).
303+
identifiers: Optional list of source IDs. None = full table.
301304
raw: Skip special-case handling.
302305
backend: Force specific backend.
303306
@@ -306,52 +309,71 @@ def translation_dict(
306309
307310
Example::
308311
309-
table = translation_dict(['TP53', 'EGFR'], 'genesymbol', 'uniprot')
312+
# Full table
313+
table = translation_dict('genesymbol', 'uniprot')
310314
table['TP53'] # {'P04637'}
315+
316+
# Specific IDs only
317+
table = translation_dict(
318+
'genesymbol', 'uniprot', identifiers=['TP53', 'EGFR'],
319+
)
311320
"""
312321

313-
if isinstance(identifiers, str):
314-
identifiers = [identifiers]
322+
if identifiers is not None:
323+
if isinstance(identifiers, str):
324+
identifiers = [identifiers]
325+
326+
return translate(
327+
identifiers,
328+
id_type,
329+
target_id_type,
330+
ncbi_tax_id,
331+
raw=raw,
332+
backend=backend,
333+
)
315334

316-
return translate(
317-
identifiers,
318-
id_type,
319-
target_id_type,
320-
ncbi_tax_id,
321-
raw=raw,
322-
backend=backend,
323-
)
335+
# Full table download
336+
data = _get('/mapping/table', {
337+
'id_type': id_type,
338+
'target_id_type': target_id_type,
339+
'ncbi_tax_id': ncbi_tax_id,
340+
})
341+
table = data.get('table', {})
342+
343+
return {k: set(v) for k, v in table.items()}
324344

325345

326346
def translation_df(
327-
identifiers: str | list[str],
328347
id_type: str,
329348
target_id_type: str,
330349
ncbi_tax_id: int = 9606,
350+
identifiers: str | list[str] | None = None,
331351
raw: bool = False,
332352
backend: str | None = None,
333353
) -> Any:
334354
"""Get translation data as a DataFrame.
335355
336-
Returns a two-column DataFrame with source and target IDs.
337-
Prefers polars; falls back to pandas.
356+
Downloads the full table by default. Returns a two-column DataFrame.
357+
358+
Args:
359+
id_type: Source ID type.
360+
target_id_type: Target ID type.
361+
ncbi_tax_id: Organism (default: 9606).
362+
identifiers: Optional list of source IDs. None = full table.
363+
raw: Skip special-case handling.
364+
backend: Force specific backend.
338365
339366
Example::
340367
341-
df = translation_df(['TP53', 'EGFR'], 'genesymbol', 'uniprot')
342-
# genesymbol uniprot
343-
# 0 TP53 P04637
344-
# 1 EGFR P00533
368+
df = translation_df('genesymbol', 'uniprot')
369+
# Full genesymbol -> uniprot table as DataFrame
345370
"""
346371

347-
if isinstance(identifiers, str):
348-
identifiers = [identifiers]
349-
350-
trans = translate(
351-
identifiers,
372+
trans = translation_dict(
352373
id_type,
353374
target_id_type,
354375
ncbi_tax_id,
376+
identifiers=identifiers,
355377
raw=raw,
356378
backend=backend,
357379
)

0 commit comments

Comments
 (0)