Skip to content

Proposed bugfix for batchtools reveals bug in future.batchtools? #74

Open
@stuvet

Description

@stuvet

I have submitted a simple pull request for a bugfix in batchtools which fuelled some of its behaviour mentioned in #73.

I'm writing here because this proposed bugfix reveals an error in future.batchtools, though based on the future.debug output I don't believe the two are related - just that the batchtools bug previously threw the error first.

Describe the bug
When a slurm worker is availble for the job, (& when the batchtools::waitForFile is not required by batchtools::getLog -> batchtools::readLog), everything functions correctly (note the status):

[01:10:21.286] Job Registry
[01:10:21.286]   Backend  : Slurm
[01:10:21.286]   File dir : /home/stuart/pipelines/production_reports/.future/20210605_011015-MndN2h/todo_chains_5cd13a69_900365739
[01:10:21.286]   Work dir : /home/stuart/pipelines/production_reports
[01:10:21.286]   Jobs     : 0
[01:10:21.286]   Seed     : 26422
[01:10:21.286]   Writeable: TRUE
[01:10:21.287] Attaching 1 packages (‘targets’) ...
[01:10:21.294] Attaching 1 packages (‘targets’) ... DONE
[01:10:22.894] batchtools::batchMap()
Adding 1 jobs ...
[01:10:22.919] Created BatchtoolsFuture future #1
Submitting 1 jobs in 1 chunks using cluster functions 'Slurm' ...
[01:10:24.107] Launched future #1
[01:10:24.109] plan(): Setting new future strategy stack:
[01:10:24.109] List of future strategies:
[01:10:24.109] 1. FutureStrategy:
[01:10:24.109]    - args: function (..., envir = parent.frame(), template = "batchtools.slurm.tmpl", workers = 8, resources = list(bucket = "targets-production-reports-209813", walltime = 3600, memory = 2048, ncpus = 1, ntasks = 1, partition = "ph4"))
[01:10:24.109]    - tweaked: TRUE
[01:10:24.109]    - call: future::plan(slurm)
[01:10:24.111] plan(): nbrOfWorkers() = 8
[01:10:24.112] status() for BatchtoolsFuture ...
[01:10:24.377] - status: ‘defined’, ‘running’, ‘started’, ‘submitted’, ‘system’
[01:10:24.377] status() for BatchtoolsFuture ... done

But when a slurm worker needs to be provisioned to run the job (& so the batchtools::waitForFile will also be called), the initial result of the call to future.batchtools::status(future) in future.batchtools::await is incorrect:

[01:43:55.613] Job Registry
[01:43:55.613]   Backend  : Slurm
[01:43:55.613]   File dir : /home/stuart/pipelines/production_reports/.future/20210605_014355-hJknNd/todo_chains_dced720f_1866986666
[01:43:55.613]   Work dir : /home/stuart/pipelines/production_reports
[01:43:55.613]   Jobs     : 0
[01:43:55.613]   Seed     : 23275
[01:43:55.613]   Writeable: TRUE
[01:43:55.614] Attaching 1 packages (‘targets’) ...
[01:43:55.622] Attaching 1 packages (‘targets’) ... DONE
[01:43:57.447] batchtools::batchMap()
Adding 1 jobs ...
[01:43:57.482] Created BatchtoolsFuture future #1
Submitting 1 jobs in 1 chunks using cluster functions 'Slurm' ...
[01:43:58.682] Launched future #1
[01:43:58.683] plan(): Setting new future strategy stack:
[01:43:58.683] List of future strategies:
[01:43:58.683] 1. FutureStrategy:
[01:43:58.683]    - args: function (..., envir = parent.frame(), template = "batchtools.slurm.tmpl", workers = 8, resources = list(bucket = "targets-production-reports-209813", walltime = 3600, memory = 2048, ncpus = 1, ntasks = 1, partition = "ph4"))
[01:43:58.683]    - tweaked: TRUE
[01:43:58.683]    - call: future::plan(slurm)
[01:43:58.684] plan(): nbrOfWorkers() = 8
[01:43:58.685] status() for BatchtoolsFuture ...
[01:43:58.778] - status: ‘defined’, ‘expired’, ‘submitted’
[01:43:58.779] status() for BatchtoolsFuture ... done
[01:43:58.779] status() for BatchtoolsFuture ...
[01:43:58.870] - status: ‘defined’, ‘expired’, ‘submitted’
[01:43:58.871] status() for BatchtoolsFuture ... done
[01:43:58.872] batchtools::waitForJobs() ...
[01:44:03.310] - batchtools::waitForJobs(): FALSE
[01:44:03.310] status() for BatchtoolsFuture ...
[01:44:03.414] - status: ‘defined’, ‘expired’, ‘submitted’
[01:44:03.414] status() for BatchtoolsFuture ... done
[01:44:03.414] - status(): ‘defined’, ‘expired’, ‘submitted’
[01:44:03.414] batchtools::waitForJobs() ... done
[01:44:03.415] Results:
[01:44:03.415] status() for BatchtoolsFuture ...
[01:44:03.508] - status: ‘defined’, ‘expired’, ‘submitted’
[01:44:03.509] status() for BatchtoolsFuture ... done
[01:44:03.509] status() for BatchtoolsFuture ...
[01:44:03.601] - status: ‘defined’, ‘expired’, ‘submitted’
[01:44:03.601] status() for BatchtoolsFuture ... done

At this point, logging inserted into batchtools:::waitForFile begins to appear. No more future.debug messages appear until the log file has been detected (now, after the proposed bugfix) & batchtools::waitForFile exits.

[01:44:47.684] plan(): Setting new future strategy stack:
[01:44:47.684] List of future strategies:
[01:44:47.684] 1. sequential:
[01:44:47.684]    - args: function (..., envir = parent.frame())
[01:44:47.684]    - tweaked: FALSE
[01:44:47.684]    - call: future::plan(future::sequential)
[01:44:47.685] plan(): nbrOfWorkers() = 1
Error : BatchtoolsExpiration: Future ('todo_chains_dced720f') expired (registry path /home/stuart/pipelines/production_reports/.future/20210605_014355-hJknNd/todo_chains_dced720f_1866986666).. No logged output exist.
In addition: There were 17 warnings (use warnings() to see them)
[01:44:47.724] status() for BatchtoolsFuture ...
[01:44:47.829] - status: ‘defined’, ‘running’, ‘started’, ‘submitted’, ‘system’
[01:44:47.829] status() for BatchtoolsFuture ... done

To be clear, the logged output does exist, and continues to be written by the running job after future.batchtools::await flags the job as expired & exits.

Please let me know if there's anything else I can do to help resolve this issue.

Expected behavior
future.batchtools::await waits for the running job to exit, even when workers need to be provisioned or batchtools::waitForFile is triggered.

Session information

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /apps/R/4.1.0/lib64/R/lib/libRblas.so
LAPACK: /apps/R/4.1.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] defineDataAccess_0.0.0.9002 forcats_0.5.1               stringr_1.4.0              
 [4] purrr_0.3.4                 readr_1.4.0                 tidyr_1.1.3                
 [7] tibble_3.1.2                ggplot2_3.3.3               tidyverse_1.3.1            
[10] brms_2.15.0                 Rcpp_1.0.6                  dplyr_1.0.6                
[13] future.batchtools_0.10.0    batchtools_0.9.15           stringi_1.6.2              
[16] future_1.21.0               stantargets_0.0.2.9000      tarchetypes_0.2.0          
[19] targets_0.4.2               aws.s3_0.3.21               googleCloudStorageR_0.6.0  
[22] data.table_1.14.0          

loaded via a namespace (and not attached):
  [1] utf8_1.2.1            R.utils_2.10.1        tidyselect_1.1.1     
  [4] lme4_1.1-27           htmlwidgets_1.5.3     grid_4.1.0           
  [7] aws.signature_0.6.0   munsell_0.5.0         base64url_1.4        
 [10] codetools_0.2-18      nleqslv_3.3.2         DT_0.18              
 [13] miniUI_0.1.1.1        withr_2.4.2           Brobdingnag_1.2-6    
 [16] colorspace_2.0-1      fst_0.9.4             knitr_1.33           
 [19] rstudioapi_0.13       stats4_4.1.0          bayesplot_1.8.0      
 [22] listenv_0.8.0         emmeans_1.6.0         rstan_2.26.1         
 [25] bit64_4.0.5           farver_2.1.0          bridgesampling_1.1-2 
 [28] coda_0.19-4           parallelly_1.25.0     vctrs_0.3.8          
 [31] generics_0.1.0        TH.data_1.0-10        xfun_0.23            
 [34] R6_2.5.0              markdown_1.1          gamm4_0.2-6          
 [37] projpred_2.0.2        arrow_4.0.0.1         cachem_1.0.5         
 [40] assertthat_0.2.1      promises_1.2.0.1      scales_1.1.1         
 [43] multcomp_1.4-17       debugme_1.1.0         gtable_0.3.0         
 [46] globals_0.14.0        processx_3.5.2        sandwich_3.0-1       
 [49] rlang_0.4.11          miniCRAN_0.2.14       splines_4.1.0        
 [52] gargle_1.1.0          broom_0.7.6           brew_1.0-6           
 [55] checkmate_2.0.0       inline_0.3.18         modelr_0.1.8         
 [58] yaml_2.2.1            reshape2_1.4.4        abind_1.4-5          
 [61] threejs_0.3.3         crosstalk_1.1.1       backports_1.2.1      
 [64] httpuv_1.6.1          rsconnect_0.8.17      tensorA_0.36.2       
 [67] tools_4.1.0           feather_0.3.5         ellipsis_0.3.2       
 [70] posterior_0.1.6       ggridges_0.5.3        plyr_1.8.6           
 [73] base64enc_0.1-3       progress_1.2.2        ps_1.6.0             
 [76] prettyunits_1.1.1     openssl_1.4.4         zoo_1.8-9            
 [79] haven_2.4.1           fs_1.5.0              furrr_0.2.2          
 [82] magrittr_2.0.1        reprex_2.0.0          colourpicker_1.1.0   
 [85] mvtnorm_1.1-1         matrixStats_0.58.0    stringfish_0.15.1    
 [88] evaluate_0.14         hms_1.1.0             qs_0.24.1            
 [91] shinyjs_2.0.0         mime_0.10             xtable_1.8-4         
 [94] shinystan_2.5.0       readxl_1.3.1          gridExtra_2.3        
 [97] rstantools_2.1.1      compiler_4.1.0        credentials_1.3.0    
[100] V8_3.4.2              crayon_1.4.1          minqa_1.2.4          
[103] R.oo_1.24.0           StanHeaders_2.26.1    htmltools_0.5.1.1    
[106] mgcv_1.8-35           later_1.2.0           RcppParallel_5.1.4   
[109] lubridate_1.7.10      DBI_1.1.1             RApiSerialize_0.1.0  
[112] dbplyr_2.1.1          MASS_7.3-54           connectapi_0.1.0.9026
[115] rappdirs_0.3.3        boot_1.3-28           sys_3.4              
[118] Matrix_1.3-3          cli_2.5.0             R.methodsS3_1.8.1    
[121] parallel_4.1.0        igraph_1.2.6          pkgconfig_2.0.3      
[124] xml2_1.3.2            dygraphs_1.1.1.6      pins_0.4.5           
[127] estimability_1.3      rvest_1.0.0           distributional_0.2.2 
[130] callr_3.7.0           digest_0.6.27         rmarkdown_2.8        
[133] cellranger_1.1.0      curl_4.3.1            shiny_1.6.0          
[136] gtools_3.8.2          nloptr_1.2.2.2        lifecycle_1.0.0      
[139] nlme_3.1-152          jsonlite_1.7.2        cmdstanr_0.4.0.9000  
[142] askpass_1.1           fansi_0.4.2           pillar_1.6.1         
[145] lattice_0.20-44       loo_2.4.1             fastmap_1.1.0        
[148] httr_1.4.2            pkgbuild_1.2.0        survival_3.2-11      
[151] googleAuthR_1.4.0     glue_1.4.2            xts_0.12.1           
[154] zip_2.1.1             gert_1.3.0            shinythemes_1.2.0    
[157] bit_4.0.4             memoise_2.0.0        

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions