-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Describe the bug
Using maps() to map ChEMBL IDs from an input df like:
source identifier
Cl / CHEMBL1091
Cl / CHEMBL11
Cl / CHEMBL99
to ChEBI ID using the metabolites20240416.bridge file as loadDatabase() argument produces inconsistently mapped duplicate ChEBI IDs:
source identifier target mapping isPrimary
Cl / CHEMBL1091 / Ce / CHEBI:17609 / T
Cl / CHEMBL1091 / Ce / 17609 / F
but also both duplicate IDs being indicated as primary:
source identifier target mapping isPrimary
Cl / CHEMBL11 / Ce / CHEBI:47499 / T
Cl / CHEMBL11 / Ce / 47499 / T
or even duplicate IDs being indicated as both true and false primary IDs:
source identifier target mapping isPrimary
Cl / CHEMBL1152 / Ce / CHEBI:8380 / T
Cl / CHEMBL1152 / Ce / 8380 / F
Cl / CHEMBL1152 / Ce / 8380 / T
Provide a minimally reproducible example (reprex)
The 'identifiers' argument for the maps() function is an input dataframe such as:
source identifier
Cl / CHEMBL1091
Cl / CHEMBL11
Cl / CHEMBL99
which was generated like this:
metabolite_input <- data.frame(
source = rep("Cl", length(mapped_chembls[, 1])),
identifier = mapped_chembls[, 1]
)
where mapped_chembls is a data frame with a single column containing one CHEMBL ID in the format 'CHEMBL123' per row.
The 'mapper' argument is an absolute file path like:
"C:/Users/user/Documents/GitHub/repo/BridgeDb/metabolites_20240416.bridge"
and the 'target' argument is 'Ce' to map to ChEBI.
Expected behavior
I believe that ChEBI IDs are typically associated with single unique ChEMBL IDs, so an ideal output should look like:
source identifier target mapping isPrimary
Cl / CHEMBL1152 / Ce / CHEBI:8380 / T
With the "CHEBI:" prefix in front of the actual ID.
R Session Information
Please report the output of either sessionInfo() or
sessioninfo::session_info() here.
Details
options(width = 120)
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_Europe.utf8 LC_CTYPE=English_Europe.utf8 LC_MONETARY=English_Europe.utf8 LC_NUMERIC=C LC_TIME=English_Europe.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reprex_2.1.0 curl_5.2.1 BridgeDbR_2.10.2 rJava_1.0-11 RCy3_2.20.2 rWikiPathways_1.20.0 tidyr_1.3.1 rvest_1.0.4
[9] gprofiler2_0.2.3 stringr_1.5.1 httr_1.4.7 dplyr_1.1.4
loaded via a namespace (and not attached):
[1] gtable_0.3.4 rjson_0.2.21 ggplot2_3.5.0 htmlwidgets_1.6.4 caTools_1.18.2 vctrs_0.6.5 tools_4.3.3 bitops_1.0-7
[9] generics_0.1.3 stats4_4.3.3 base64url_1.4 tibble_3.2.1 fansi_1.0.6 pkgconfig_2.0.3 KernSmooth_2.23-22 data.table_1.15.4
[17] RColorBrewer_1.1-3 uuid_1.2-0 graph_1.78.0 lifecycle_1.0.4 compiler_4.3.3 gplots_3.1.3.1 munsell_0.5.1 repr_1.1.7
[25] uchardet_1.1.1 htmltools_0.5.8.1 RCurl_1.98-1.14 lazyeval_0.2.2 plotly_4.10.4 pillar_1.9.0 crayon_1.5.2 gtools_3.9.5
[33] tidyselect_1.2.1 digest_0.6.35 stringi_1.8.3 purrr_1.0.2 RJSONIO_1.3-1.9 fastmap_1.1.1 grid_4.3.3 colorspace_2.1-0
[41] cli_3.6.2 magrittr_2.0.3 base64enc_0.1-3 XML_3.99-0.16.1 utf8_1.2.4 IRdisplay_1.1 withr_3.0.0 scales_1.3.0
[49] backports_1.4.1 IRkernel_1.3.2 pbdZMQ_0.3-10 evaluate_0.23 viridisLite_0.4.2 rlang_1.1.3 glue_1.7.0 selectr_0.4-2
[57] BiocManager_1.30.22 xml2_1.3.6 BiocGenerics_0.46.0 pkgload_1.3.4 rstudioapi_0.16.0 jsonlite_1.8.8 R6_2.5.1 fs_1.6.3 Indicate whether BiocManager::valid() returns TRUE.
BiocManager::valid() returns
"4 packages out-of-date; 0 packages too new"
Is the package installed via bioconda?
BridgeDbR is installed via BiocManager.