You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf: Deduplicate file hashing and parallelize globwalks (#11902)
## Summary
Optimizes `turbo run --dry` wall-clock time by up to 1.48x on large
monorepos by eliminating redundant file hashing work and removing a
serialization bottleneck in globwalk operations.
### Benchmarks
Tested across three repos of varying size:
| Repo | Packages | Before | After | Speedup |
|------|----------|--------|-------|---------|
| large | ~1000 | 5.903s | 3.999s | **1.48x** |
| medium | ~120 | 1.461s | 1.380s | 1.06x |
| small | ~6 | 0.659s | 0.693s | ~1.0x (noise) |
The improvement scales with repo size — specifically with how many tasks
share the same `(package, inputs)` combination.
### Changes
**File hash deduplication** — Multiple tasks in the same package with
identical `inputs` config (e.g. `build`, `lint`, `typecheck` all in one
package) previously each ran an independent globwalk + file hash
computation. Now tasks are grouped by `(package_path, globs,
include_default)` and each unique combination is computed once, with
results shared across tasks.
**Parallel globwalks via retry-on-EMFILE** — The previous `IoSemaphore`
(max=1) serialized all globwalk operations to prevent fd exhaustion,
making this the dominant bottleneck on large repos. This replaces the
semaphore with retry-with-exponential-backoff on `EMFILE` errors (the
same pattern Node's `graceful-fs` uses), allowing globwalks to run fully
parallel on rayon. If the OS returns "too many open files", the
operation sleeps briefly and retries — up to 10 times with exponential
backoff capped at 1s.
**Zero-copy lockfile dependency lookups** — `Lockfile::all_dependencies`
now returns `Cow<'_, HashMap<String, String>>` instead of cloning the
HashMap on every call. For pnpm (which pre-builds a dependency index),
this eliminates ~329k HashMap clones during transitive closure
resolution.
**Optimized transitive closure cache keys** — The `DashMap` resolve
cache now uses a single null-byte-separated `String` key built into a
reusable buffer, instead of allocating a `(String, String, String)`
tuple per lookup.
**HashMap importers for pnpm** — Converted pnpm's `importers` field from
`BTreeMap` to `HashMap` (with sorted serialization) for O(1) workspace
lookups during `resolve_package`.
0 commit comments