Skip to content

[BUG] Mapping ChEMBL IDs to ChEBI IDs using the 16/04/2024 metabolite bridge file produces inconsistent duplicate ChEBI IDs #45

@pklemmer

Description

@pklemmer

Describe the bug

Using maps() to map ChEMBL IDs from an input df like:

source identifier
Cl / CHEMBL1091
Cl / CHEMBL11
Cl / CHEMBL99

to ChEBI ID using the metabolites20240416.bridge file as loadDatabase() argument produces inconsistently mapped duplicate ChEBI IDs:

source identifier target mapping isPrimary
Cl / CHEMBL1091 / Ce / CHEBI:17609 / T
Cl / CHEMBL1091 / Ce / 17609 / F

but also both duplicate IDs being indicated as primary:

source identifier target mapping isPrimary
Cl / CHEMBL11 / Ce / CHEBI:47499 / T
Cl / CHEMBL11 / Ce / 47499 / T

or even duplicate IDs being indicated as both true and false primary IDs:

source identifier target mapping isPrimary
Cl / CHEMBL1152 / Ce / CHEBI:8380 / T
Cl / CHEMBL1152 / Ce / 8380 / F
Cl / CHEMBL1152 / Ce / 8380 / T

Provide a minimally reproducible example (reprex)

The 'identifiers' argument for the maps() function is an input dataframe such as:

source identifier
Cl / CHEMBL1091
Cl / CHEMBL11
Cl / CHEMBL99

which was generated like this:

metabolite_input <- data.frame(
source = rep("Cl", length(mapped_chembls[, 1])),
identifier = mapped_chembls[, 1]
)

where mapped_chembls is a data frame with a single column containing one CHEMBL ID in the format 'CHEMBL123' per row.

The 'mapper' argument is an absolute file path like:

"C:/Users/user/Documents/GitHub/repo/BridgeDb/metabolites_20240416.bridge"

and the 'target' argument is 'Ce' to map to ChEBI.

Expected behavior

I believe that ChEBI IDs are typically associated with single unique ChEMBL IDs, so an ideal output should look like:

source identifier target mapping isPrimary
Cl / CHEMBL1152 / Ce / CHEBI:8380 / T

With the "CHEBI:" prefix in front of the actual ID.

R Session Information

Please report the output of either sessionInfo() or
sessioninfo::session_info() here.

Details
options(width = 120)
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_Europe.utf8  LC_CTYPE=English_Europe.utf8    LC_MONETARY=English_Europe.utf8 LC_NUMERIC=C                    LC_TIME=English_Europe.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reprex_2.1.0         curl_5.2.1           BridgeDbR_2.10.2     rJava_1.0-11         RCy3_2.20.2          rWikiPathways_1.20.0 tidyr_1.3.1          rvest_1.0.4         
 [9] gprofiler2_0.2.3     stringr_1.5.1        httr_1.4.7           dplyr_1.1.4         

loaded via a namespace (and not attached):
 [1] gtable_0.3.4        rjson_0.2.21        ggplot2_3.5.0       htmlwidgets_1.6.4   caTools_1.18.2      vctrs_0.6.5         tools_4.3.3         bitops_1.0-7       
 [9] generics_0.1.3      stats4_4.3.3        base64url_1.4       tibble_3.2.1        fansi_1.0.6         pkgconfig_2.0.3     KernSmooth_2.23-22  data.table_1.15.4  
[17] RColorBrewer_1.1-3  uuid_1.2-0          graph_1.78.0        lifecycle_1.0.4     compiler_4.3.3      gplots_3.1.3.1      munsell_0.5.1       repr_1.1.7         
[25] uchardet_1.1.1      htmltools_0.5.8.1   RCurl_1.98-1.14     lazyeval_0.2.2      plotly_4.10.4       pillar_1.9.0        crayon_1.5.2        gtools_3.9.5       
[33] tidyselect_1.2.1    digest_0.6.35       stringi_1.8.3       purrr_1.0.2         RJSONIO_1.3-1.9     fastmap_1.1.1       grid_4.3.3          colorspace_2.1-0   
[41] cli_3.6.2           magrittr_2.0.3      base64enc_0.1-3     XML_3.99-0.16.1     utf8_1.2.4          IRdisplay_1.1       withr_3.0.0         scales_1.3.0       
[49] backports_1.4.1     IRkernel_1.3.2      pbdZMQ_0.3-10       evaluate_0.23       viridisLite_0.4.2   rlang_1.1.3         glue_1.7.0          selectr_0.4-2      
[57] BiocManager_1.30.22 xml2_1.3.6          BiocGenerics_0.46.0 pkgload_1.3.4       rstudioapi_0.16.0   jsonlite_1.8.8      R6_2.5.1            fs_1.6.3           

Indicate whether BiocManager::valid() returns TRUE.

BiocManager::valid() returns
"4 packages out-of-date; 0 packages too new"

Is the package installed via bioconda?

BridgeDbR is installed via BiocManager.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions