Skip to content

fix(pm): prefetch caches for lockfile installs#2845

Closed
killagu wants to merge 1 commit into
nextfrom
agent/egg-dev/6ef7f202
Closed

fix(pm): prefetch caches for lockfile installs#2845
killagu wants to merge 1 commit into
nextfrom
agent/egg-dev/6ef7f202

Conversation

@killagu
Copy link
Copy Markdown
Contributor

@killagu killagu commented Apr 27, 2026

Summary

  • Start background cache prefetching when installing from an already-fresh package-lock.json.
  • Reuse the existing OnceMap cache/download path so clone tasks share in-flight downloads instead of discovering tarballs depth-by-depth.

Benchmark

Command: PATH=$PWD/target/release-local:$PATH BENCH_RUNS=1 PM_LIST=utoo PROJECT=ant-design REGISTRY=https://registry.npmjs.org ./bench/pm-bench-phases.sh

Before:

  • p0_full_cold: 11.78s
  • p1_resolve: 5.87s
  • p3_cold_install: 41.40s
  • p4_warm_link: 1.09s

After:

  • p0_full_cold: 11.59s
  • p1_resolve: 5.38s
  • p3_cold_install: 10.07s
  • p4_warm_link: 1.01s

Verification

  • cargo fmt
  • cargo build -p utoo-pm --profile release-local
  • cargo test -p utoo-pm (257 passed, 3 ignored)
  • cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Full workspace cargo clippy --all-targets -- -D warnings --no-deps was attempted but is blocked in this environment because pkg-config is missing for openssl-sys while checking broader workspace targets.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a background prefetch mechanism for the lockfile cache to optimize the installation process. The feedback identifies opportunities to improve efficiency by deduplicating the package list before prefetching and using for_each_concurrent to manage task concurrency more effectively than spawning individual tasks for every package.

Comment on lines +193 to +252
let mut packages = Vec::new();

for (path, package) in groups.values().flat_map(|pkgs| pkgs.iter()) {
if should_omit_package(package, omit) || package.link.is_some() {
continue;
}

if let Some(ref cpu) = package.cpu
&& !is_cpu_compatible(cpu)
{
continue;
}

if let Some(ref os) = package.os
&& !is_os_compatible(os)
{
continue;
}

let Some(version) = package.version.clone() else {
continue;
};
let Some(resolved) = package.resolved.clone() else {
continue;
};

let resolved = match resolved.strip_prefix("file:") {
Some(rel) if !Path::new(rel).is_absolute() => {
format!("file:{}", cwd.join(rel).display())
}
_ => resolved,
};

packages.push((package.get_name(path), version, resolved));
}

if packages.is_empty() {
return None;
}

Some(tokio::spawn(async move {
let mut tasks = futures::stream::FuturesUnordered::new();

for (name, version, resolved) in packages {
tasks.push(tokio::spawn(async move {
if resolve_cache_path(&name, &version, &resolved)
.await
.is_none()
{
tracing::debug!("Prefetch skipped or failed for {name}@{version}");
}
}));
}

while let Some(result) = tasks.next().await {
if let Err(e) = result {
tracing::debug!("Lockfile cache prefetch task failed: {e}");
}
}
}))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current prefetch implementation can be improved in two ways:

  1. Deduplication: A package (same name, version, and resolved URL) can appear multiple times in the lockfile at different paths (e.g., in nested node_modules). Deduplicating the packages list before prefetching avoids redundant cache checks and unnecessary task spawning.
  2. Concurrency Control: Spawning a separate tokio::spawn for every single package in a large lockfile (which can have thousands of entries) adds significant overhead to the scheduler. Using for_each_concurrent on a stream of packages is more idiomatic and allows for better control over the number of concurrent resolution tasks.

Since the entire prefetch process is already wrapped in a single background tokio::spawn, we don't need to spawn additional tasks for each individual package.

    let mut packages = Vec::new();
    let mut seen = HashSet::new();

    for (path, package) in groups.values().flat_map(|pkgs| pkgs.iter()) {
        if should_omit_package(package, omit) || package.link.is_some() {
            continue;
        }

        if let Some(ref cpu) = package.cpu
            && !is_cpu_compatible(cpu)
        {
            continue;
        }

        if let Some(ref os) = package.os
            && !is_os_compatible(os)
        {
            continue;
        }

        let Some(version) = package.version.clone() else {
            continue;
        };
        let Some(resolved) = package.resolved.clone() else {
            continue;
        };

        let resolved = match resolved.strip_prefix("file:") {
            Some(rel) if !Path::new(rel).is_absolute() => {
                format!("file:{}", cwd.join(rel).display())
            }
            _ => resolved,
        };

        let name = package.get_name(path);
        if seen.insert((name.clone(), version.clone(), resolved.clone())) {
            packages.push((name, version, resolved));
        }
    }

    if packages.is_empty() {
        return None;
    }

    Some(tokio::spawn(async move {
        futures::stream::iter(packages)
            .for_each_concurrent(50, |(name, version, resolved)| async move {
                if resolve_cache_path(&name, &version, &resolved)
                    .await
                    .is_none()
                {
                    tracing::debug!("Prefetch skipped or failed for {name}@{version}");
                }
            })
            .await;
    }))

@elrrrrrrr
Copy link
Copy Markdown
Contributor

Closing as stale: this draft is a one-off agent experiment from 2026-04-27 with no follow-up, and overlaps with sibling PRs exploring the same optimization. Reopen if revisited.

@elrrrrrrr elrrrrrrr closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants