fix(pm): prefetch caches for lockfile installs#2845
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a background prefetch mechanism for the lockfile cache to optimize the installation process. The feedback identifies opportunities to improve efficiency by deduplicating the package list before prefetching and using for_each_concurrent to manage task concurrency more effectively than spawning individual tasks for every package.
| let mut packages = Vec::new(); | ||
|
|
||
| for (path, package) in groups.values().flat_map(|pkgs| pkgs.iter()) { | ||
| if should_omit_package(package, omit) || package.link.is_some() { | ||
| continue; | ||
| } | ||
|
|
||
| if let Some(ref cpu) = package.cpu | ||
| && !is_cpu_compatible(cpu) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| if let Some(ref os) = package.os | ||
| && !is_os_compatible(os) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| let Some(version) = package.version.clone() else { | ||
| continue; | ||
| }; | ||
| let Some(resolved) = package.resolved.clone() else { | ||
| continue; | ||
| }; | ||
|
|
||
| let resolved = match resolved.strip_prefix("file:") { | ||
| Some(rel) if !Path::new(rel).is_absolute() => { | ||
| format!("file:{}", cwd.join(rel).display()) | ||
| } | ||
| _ => resolved, | ||
| }; | ||
|
|
||
| packages.push((package.get_name(path), version, resolved)); | ||
| } | ||
|
|
||
| if packages.is_empty() { | ||
| return None; | ||
| } | ||
|
|
||
| Some(tokio::spawn(async move { | ||
| let mut tasks = futures::stream::FuturesUnordered::new(); | ||
|
|
||
| for (name, version, resolved) in packages { | ||
| tasks.push(tokio::spawn(async move { | ||
| if resolve_cache_path(&name, &version, &resolved) | ||
| .await | ||
| .is_none() | ||
| { | ||
| tracing::debug!("Prefetch skipped or failed for {name}@{version}"); | ||
| } | ||
| })); | ||
| } | ||
|
|
||
| while let Some(result) = tasks.next().await { | ||
| if let Err(e) = result { | ||
| tracing::debug!("Lockfile cache prefetch task failed: {e}"); | ||
| } | ||
| } | ||
| })) |
There was a problem hiding this comment.
The current prefetch implementation can be improved in two ways:
- Deduplication: A package (same name, version, and resolved URL) can appear multiple times in the lockfile at different paths (e.g., in nested
node_modules). Deduplicating thepackageslist before prefetching avoids redundant cache checks and unnecessary task spawning. - Concurrency Control: Spawning a separate
tokio::spawnfor every single package in a large lockfile (which can have thousands of entries) adds significant overhead to the scheduler. Usingfor_each_concurrenton a stream of packages is more idiomatic and allows for better control over the number of concurrent resolution tasks.
Since the entire prefetch process is already wrapped in a single background tokio::spawn, we don't need to spawn additional tasks for each individual package.
let mut packages = Vec::new();
let mut seen = HashSet::new();
for (path, package) in groups.values().flat_map(|pkgs| pkgs.iter()) {
if should_omit_package(package, omit) || package.link.is_some() {
continue;
}
if let Some(ref cpu) = package.cpu
&& !is_cpu_compatible(cpu)
{
continue;
}
if let Some(ref os) = package.os
&& !is_os_compatible(os)
{
continue;
}
let Some(version) = package.version.clone() else {
continue;
};
let Some(resolved) = package.resolved.clone() else {
continue;
};
let resolved = match resolved.strip_prefix("file:") {
Some(rel) if !Path::new(rel).is_absolute() => {
format!("file:{}", cwd.join(rel).display())
}
_ => resolved,
};
let name = package.get_name(path);
if seen.insert((name.clone(), version.clone(), resolved.clone())) {
packages.push((name, version, resolved));
}
}
if packages.is_empty() {
return None;
}
Some(tokio::spawn(async move {
futures::stream::iter(packages)
.for_each_concurrent(50, |(name, version, resolved)| async move {
if resolve_cache_path(&name, &version, &resolved)
.await
.is_none()
{
tracing::debug!("Prefetch skipped or failed for {name}@{version}");
}
})
.await;
}))|
Closing as stale: this draft is a one-off agent experiment from 2026-04-27 with no follow-up, and overlaps with sibling PRs exploring the same optimization. Reopen if revisited. |
Summary
Benchmark
Command: PATH=$PWD/target/release-local:$PATH BENCH_RUNS=1 PM_LIST=utoo PROJECT=ant-design REGISTRY=https://registry.npmjs.org ./bench/pm-bench-phases.sh
Before:
After:
Verification
Full workspace cargo clippy --all-targets -- -D warnings --no-deps was attempted but is blocked in this environment because pkg-config is missing for openssl-sys while checking broader workspace targets.