Deterministic, zero-copy parallel execution for R.
shard is a parallel runtime for workloads that look like:
- "run the same numeric kernel over many slices of big data"
- "thousands of independent tasks over a shared dataset"
- "parallel GLM / simulation / bootstrap / feature screening"
It focuses on three things that are often painful in R parallelism:
- Shared immutable inputs (avoid duplicating large objects across workers)
- Explicit output buffers (avoid huge result-gather lists)
- Deterministic cleanup (supervise workers and recycle on memory drift)
From CRAN (once released):
install.packages("shard")Development version:
# install.packages("pak")
pak::pak("bbuchsbaum/shard")X <- shard::share(X) # matrix/array/vector
Y <- shard::share(Y)Shared objects are designed for zero-copy parallel reads (where the OS allows) and are treated as immutable by default inside parallel tasks.
Instead of returning giant objects from each worker, write to a preallocated buffer:
out <- shard::buffer("double", dim = c(1e6)) # example: 1M outputsblocks <- shard::shards(1e6, block_size = "auto")
run <- shard::shard_map(
blocks,
borrow = list(X = X, Y = Y),
out = list(out = out),
workers = 8,
fun = function(block, X, Y, out) {
# block contains indices
idx <- block$idx
out[idx] <- colMeans(Y[, idx, drop = FALSE])
}
)
shard::report(run)By default, trying to mutate borrowed/shared inputs is treated as a bug:
cow = "deny"(default): mutation triggers an errorcow = "audit": detect and flag (best-effort; platform dependent)cow = "allow": allow copy-on-write, track it, and enforce budgets
Why default is deny:
- Prevents silent memory blowups from accidental wide writes
- Prevents subtle correctness bugs (changes are private to a worker)
- Keeps behavior predictable across platforms
R's GC and allocator behavior can lead to memory drift in long-running workers.
shard monitors per-worker memory usage and can recycle workers when drift
exceeds thresholds, keeping end-of-run memory close to baseline.
After a run, shard can report:
- total and per-worker peak RSS
- end RSS vs baseline
- materialized bytes (hidden copies)
- recycling events, retries, timing
rep <- shard::report(run)
print(rep)
shard::mem_report(run)
shard::copy_report(run)If your workload is “apply a function over columns” or “lapply over a list”,
shard provides convenience wrappers that handle sharing and buffering
automatically while still running through the supervised runtime.
X <- matrix(rnorm(1e6), nrow = 1000)
scores <- shard::shard_apply_matrix(
X,
MARGIN = 2,
FUN = function(v, y) cor(v, y),
VARS = list(y = rnorm(nrow(X))),
workers = 8
)xs <- lapply(1:1000, function(i) rnorm(100))
out <- shard::shard_lapply_shared(
xs,
FUN = function(el) mean(el),
workers = 8
)For large outputs (big vectors/data.frames per element), prefer buffer(), table_sink(),
or shard_reduce() instead of gathering everything to the master.
MIT