Description
I have submitted a simple pull request for a bugfix in batchtools
which fuelled some of its behaviour mentioned in #73.
I'm writing here because this proposed bugfix reveals an error in future.batchtools, though based on the future.debug output I don't believe the two are related - just that the batchtools bug previously threw the error first.
Describe the bug
When a slurm worker is availble for the job, (& when the batchtools::waitForFile is not required by batchtools::getLog -> batchtools::readLog), everything functions correctly (note the status):
[01:10:21.286] Job Registry
[01:10:21.286] Backend : Slurm
[01:10:21.286] File dir : /home/stuart/pipelines/production_reports/.future/20210605_011015-MndN2h/todo_chains_5cd13a69_900365739
[01:10:21.286] Work dir : /home/stuart/pipelines/production_reports
[01:10:21.286] Jobs : 0
[01:10:21.286] Seed : 26422
[01:10:21.286] Writeable: TRUE
[01:10:21.287] Attaching 1 packages (‘targets’) ...
[01:10:21.294] Attaching 1 packages (‘targets’) ... DONE
[01:10:22.894] batchtools::batchMap()
Adding 1 jobs ...
[01:10:22.919] Created BatchtoolsFuture future #1
Submitting 1 jobs in 1 chunks using cluster functions 'Slurm' ...
[01:10:24.107] Launched future #1
[01:10:24.109] plan(): Setting new future strategy stack:
[01:10:24.109] List of future strategies:
[01:10:24.109] 1. FutureStrategy:
[01:10:24.109] - args: function (..., envir = parent.frame(), template = "batchtools.slurm.tmpl", workers = 8, resources = list(bucket = "targets-production-reports-209813", walltime = 3600, memory = 2048, ncpus = 1, ntasks = 1, partition = "ph4"))
[01:10:24.109] - tweaked: TRUE
[01:10:24.109] - call: future::plan(slurm)
[01:10:24.111] plan(): nbrOfWorkers() = 8
[01:10:24.112] status() for BatchtoolsFuture ...
[01:10:24.377] - status: ‘defined’, ‘running’, ‘started’, ‘submitted’, ‘system’
[01:10:24.377] status() for BatchtoolsFuture ... done
But when a slurm worker needs to be provisioned to run the job (& so the batchtools::waitForFile will also be called), the initial result of the call to future.batchtools::status(future)
in future.batchtools::await
is incorrect:
[01:43:55.613] Job Registry
[01:43:55.613] Backend : Slurm
[01:43:55.613] File dir : /home/stuart/pipelines/production_reports/.future/20210605_014355-hJknNd/todo_chains_dced720f_1866986666
[01:43:55.613] Work dir : /home/stuart/pipelines/production_reports
[01:43:55.613] Jobs : 0
[01:43:55.613] Seed : 23275
[01:43:55.613] Writeable: TRUE
[01:43:55.614] Attaching 1 packages (‘targets’) ...
[01:43:55.622] Attaching 1 packages (‘targets’) ... DONE
[01:43:57.447] batchtools::batchMap()
Adding 1 jobs ...
[01:43:57.482] Created BatchtoolsFuture future #1
Submitting 1 jobs in 1 chunks using cluster functions 'Slurm' ...
[01:43:58.682] Launched future #1
[01:43:58.683] plan(): Setting new future strategy stack:
[01:43:58.683] List of future strategies:
[01:43:58.683] 1. FutureStrategy:
[01:43:58.683] - args: function (..., envir = parent.frame(), template = "batchtools.slurm.tmpl", workers = 8, resources = list(bucket = "targets-production-reports-209813", walltime = 3600, memory = 2048, ncpus = 1, ntasks = 1, partition = "ph4"))
[01:43:58.683] - tweaked: TRUE
[01:43:58.683] - call: future::plan(slurm)
[01:43:58.684] plan(): nbrOfWorkers() = 8
[01:43:58.685] status() for BatchtoolsFuture ...
[01:43:58.778] - status: ‘defined’, ‘expired’, ‘submitted’
[01:43:58.779] status() for BatchtoolsFuture ... done
[01:43:58.779] status() for BatchtoolsFuture ...
[01:43:58.870] - status: ‘defined’, ‘expired’, ‘submitted’
[01:43:58.871] status() for BatchtoolsFuture ... done
[01:43:58.872] batchtools::waitForJobs() ...
[01:44:03.310] - batchtools::waitForJobs(): FALSE
[01:44:03.310] status() for BatchtoolsFuture ...
[01:44:03.414] - status: ‘defined’, ‘expired’, ‘submitted’
[01:44:03.414] status() for BatchtoolsFuture ... done
[01:44:03.414] - status(): ‘defined’, ‘expired’, ‘submitted’
[01:44:03.414] batchtools::waitForJobs() ... done
[01:44:03.415] Results:
[01:44:03.415] status() for BatchtoolsFuture ...
[01:44:03.508] - status: ‘defined’, ‘expired’, ‘submitted’
[01:44:03.509] status() for BatchtoolsFuture ... done
[01:44:03.509] status() for BatchtoolsFuture ...
[01:44:03.601] - status: ‘defined’, ‘expired’, ‘submitted’
[01:44:03.601] status() for BatchtoolsFuture ... done
At this point, logging inserted into batchtools:::waitForFile
begins to appear. No more future.debug messages appear until the log file has been detected (now, after the proposed bugfix) & batchtools::waitForFile
exits.
[01:44:47.684] plan(): Setting new future strategy stack:
[01:44:47.684] List of future strategies:
[01:44:47.684] 1. sequential:
[01:44:47.684] - args: function (..., envir = parent.frame())
[01:44:47.684] - tweaked: FALSE
[01:44:47.684] - call: future::plan(future::sequential)
[01:44:47.685] plan(): nbrOfWorkers() = 1
Error : BatchtoolsExpiration: Future ('todo_chains_dced720f') expired (registry path /home/stuart/pipelines/production_reports/.future/20210605_014355-hJknNd/todo_chains_dced720f_1866986666).. No logged output exist.
In addition: There were 17 warnings (use warnings() to see them)
[01:44:47.724] status() for BatchtoolsFuture ...
[01:44:47.829] - status: ‘defined’, ‘running’, ‘started’, ‘submitted’, ‘system’
[01:44:47.829] status() for BatchtoolsFuture ... done
To be clear, the logged output does exist, and continues to be written by the running job after future.batchtools::await flags the job as expired & exits.
Please let me know if there's anything else I can do to help resolve this issue.
Expected behavior
future.batchtools::await waits for the running job to exit, even when workers need to be provisioned or batchtools::waitForFile is triggered.
Session information
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /apps/R/4.1.0/lib64/R/lib/libRblas.so
LAPACK: /apps/R/4.1.0/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] defineDataAccess_0.0.0.9002 forcats_0.5.1 stringr_1.4.0
[4] purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
[7] tibble_3.1.2 ggplot2_3.3.3 tidyverse_1.3.1
[10] brms_2.15.0 Rcpp_1.0.6 dplyr_1.0.6
[13] future.batchtools_0.10.0 batchtools_0.9.15 stringi_1.6.2
[16] future_1.21.0 stantargets_0.0.2.9000 tarchetypes_0.2.0
[19] targets_0.4.2 aws.s3_0.3.21 googleCloudStorageR_0.6.0
[22] data.table_1.14.0
loaded via a namespace (and not attached):
[1] utf8_1.2.1 R.utils_2.10.1 tidyselect_1.1.1
[4] lme4_1.1-27 htmlwidgets_1.5.3 grid_4.1.0
[7] aws.signature_0.6.0 munsell_0.5.0 base64url_1.4
[10] codetools_0.2-18 nleqslv_3.3.2 DT_0.18
[13] miniUI_0.1.1.1 withr_2.4.2 Brobdingnag_1.2-6
[16] colorspace_2.0-1 fst_0.9.4 knitr_1.33
[19] rstudioapi_0.13 stats4_4.1.0 bayesplot_1.8.0
[22] listenv_0.8.0 emmeans_1.6.0 rstan_2.26.1
[25] bit64_4.0.5 farver_2.1.0 bridgesampling_1.1-2
[28] coda_0.19-4 parallelly_1.25.0 vctrs_0.3.8
[31] generics_0.1.0 TH.data_1.0-10 xfun_0.23
[34] R6_2.5.0 markdown_1.1 gamm4_0.2-6
[37] projpred_2.0.2 arrow_4.0.0.1 cachem_1.0.5
[40] assertthat_0.2.1 promises_1.2.0.1 scales_1.1.1
[43] multcomp_1.4-17 debugme_1.1.0 gtable_0.3.0
[46] globals_0.14.0 processx_3.5.2 sandwich_3.0-1
[49] rlang_0.4.11 miniCRAN_0.2.14 splines_4.1.0
[52] gargle_1.1.0 broom_0.7.6 brew_1.0-6
[55] checkmate_2.0.0 inline_0.3.18 modelr_0.1.8
[58] yaml_2.2.1 reshape2_1.4.4 abind_1.4-5
[61] threejs_0.3.3 crosstalk_1.1.1 backports_1.2.1
[64] httpuv_1.6.1 rsconnect_0.8.17 tensorA_0.36.2
[67] tools_4.1.0 feather_0.3.5 ellipsis_0.3.2
[70] posterior_0.1.6 ggridges_0.5.3 plyr_1.8.6
[73] base64enc_0.1-3 progress_1.2.2 ps_1.6.0
[76] prettyunits_1.1.1 openssl_1.4.4 zoo_1.8-9
[79] haven_2.4.1 fs_1.5.0 furrr_0.2.2
[82] magrittr_2.0.1 reprex_2.0.0 colourpicker_1.1.0
[85] mvtnorm_1.1-1 matrixStats_0.58.0 stringfish_0.15.1
[88] evaluate_0.14 hms_1.1.0 qs_0.24.1
[91] shinyjs_2.0.0 mime_0.10 xtable_1.8-4
[94] shinystan_2.5.0 readxl_1.3.1 gridExtra_2.3
[97] rstantools_2.1.1 compiler_4.1.0 credentials_1.3.0
[100] V8_3.4.2 crayon_1.4.1 minqa_1.2.4
[103] R.oo_1.24.0 StanHeaders_2.26.1 htmltools_0.5.1.1
[106] mgcv_1.8-35 later_1.2.0 RcppParallel_5.1.4
[109] lubridate_1.7.10 DBI_1.1.1 RApiSerialize_0.1.0
[112] dbplyr_2.1.1 MASS_7.3-54 connectapi_0.1.0.9026
[115] rappdirs_0.3.3 boot_1.3-28 sys_3.4
[118] Matrix_1.3-3 cli_2.5.0 R.methodsS3_1.8.1
[121] parallel_4.1.0 igraph_1.2.6 pkgconfig_2.0.3
[124] xml2_1.3.2 dygraphs_1.1.1.6 pins_0.4.5
[127] estimability_1.3 rvest_1.0.0 distributional_0.2.2
[130] callr_3.7.0 digest_0.6.27 rmarkdown_2.8
[133] cellranger_1.1.0 curl_4.3.1 shiny_1.6.0
[136] gtools_3.8.2 nloptr_1.2.2.2 lifecycle_1.0.0
[139] nlme_3.1-152 jsonlite_1.7.2 cmdstanr_0.4.0.9000
[142] askpass_1.1 fansi_0.4.2 pillar_1.6.1
[145] lattice_0.20-44 loo_2.4.1 fastmap_1.1.0
[148] httr_1.4.2 pkgbuild_1.2.0 survival_3.2-11
[151] googleAuthR_1.4.0 glue_1.4.2 xts_0.12.1
[154] zip_2.1.1 gert_1.3.0 shinythemes_1.2.0
[157] bit_4.0.4 memoise_2.0.0