Open
Description
I'm working on a pipeline-prototyping package that uses progressr to track progress as many elements of a list are processed and where efficient re-processing is enabled by skipping over already processed elements. This results in a scenario where I gather the current ETA calculations of progressr become inaccurate as early elements that have already been successfully processed return from the processing function quickly while later elements can be expected to take much longer. I save history data associated with each element, including processing time duration, and I'm wondering if it'd be possible to provide that duration to progressr somehow to leave it with more accurate ETA estimates. A minimal example of my case is:
options(
progressr.handlers = progressr::handler_progress(
format = "[:bar] :spin :current/:total :percent in :elapsed ETA: :eta (:message)"
, clear = FALSE
)
)
options(progressr.clear=FALSE)
f = function(.x,pb){
x_hash = digest::digest(.x)
if(file.exists(x_hash)){
t = readRDS(x_hash)
}else{
t = runif(1,10,20)
Sys.sleep(t)
if(.x%%2){ # odd elements get saved, triggering skip next time they're processed
saveRDS(t,x_hash)
}
}
pb(duration = t) #imagined API for supplying the duration manually
}
#on first run, ETA is accurate
progressr::with_progress({
.x = 1:10
pb <- progressr::progressor(along = .x)
y = furrr::future_map(.x=.x,.f=f,pb=pb,.progress=F)
})
#on second run, some elements are skipped internally by f()
progressr::with_progress({
.x = 1:10
pb <- progressr::progressor(along = .x)
y = furrr::future_map(.x=.x,.f=f,pb=pb,.progress=F)
})
Metadata
Metadata
Assignees
Labels
No labels