Open
Description
generated quantities
is embarrassingly parallel over sampling iterations; but the current implementation of parallel compute in cmdstanr
method function generate_quantities
is limited to the same chains and threads_per_chain argument as is helpful in a reduce sum calculation.
Here's how I've been changing its use to employ as many available cores as available.
- repeat
posterior::split_chains
on the fitted object until these equal to the number of cores wanted - in
generate_quantities
setparallel_chains = nchains(x)
with threads_per_chain equal 1 (if used for reduce sum): e.g.,
# pull draws and split chains to = cores
x <- split_chains( f$draws() )
x <- split_chains( x )
# ... split until you have enough for each core
# uncomment generated quantities, recompile
# cpp_options aren't always needed, but to demonstrate when used
m <- cmdstan_model('fit.stan', cpp_options = list(stan_threads = TRUE))
# use the new draws object
q <- m$generate_quantities(
fitted_params = x,
data = dat,
parallel_chains = nchains(x),
threads_per_chain = 1)
Since cmdstanr
already requires posterior
, I thought it may be possible to bake in something like the above to make better use of parallel compute natively without manually splitting.
Otherwise, maybe this will serve as a tip. :)