Skip to content

Add loops to break up search values for large batch searches #31

@madison-feshuk

Description

@madison-feshuk

Trying to map dtxsids based on inchikeys in the provided file (too large to attach to ticket). Initial thought was API bug given size of request but seems more like a ctxR issue since playing around with rate_limit parameter works sometimes

data <- read_csv("activities_with_inchikeys_to_filter_in_R_all.csv") #full dataframe of 78104 records
length(unique(data$standard_inchi_key)) #47348 unique inchikeys
test <- chemical_equal_batch(word_list = unique(data$standard_inchi_key)) #returns empty data frame 
## subset for testing
# when you try  1000, it works and returns the mappings for the 780 uniques inchikeys
data <- data[1:2000,] 
#2000 fails, returns empty data frame, but works if you add rate_limit of .1
test <- chemical_equal_batch(word_list = unique(data$standard_inchi_key))

Looping through full list of search terms resolved this, but this is not expected behavior. We could consider breaking up search list if large in addition to adjusting rate limits

inchi_list <- unique(data$standard_inchi_key)
batches <- seq(1000,length(inchi_list), by = 1000)

library(ctxR)
test <- chemical_equal_batch(word_list = inchi_list[1:999], verbose = TRUE) #returns empty data frame 

for(i in batches){
  test <- rbind(test,chemical_equal_batch(word_list = inchi_list[i:(i+999)]))
  print(i)
}
test <- rbind(test,chemical_equal_batch(word_list = inchi_list[max(batches)+1:length(inchi_list)]))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions