Skip to content

perf(pm): source resolver mainloop architecture#3028

Draft
elrrrrrrr wants to merge 21 commits into
nextfrom
perf/pm-source-resolver
Draft

perf(pm): source resolver mainloop architecture#3028
elrrrrrrr wants to merge 21 commits into
nextfrom
perf/pm-source-resolver

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 21, 2026

Summary

Draft source PR for the resolver half of #2948. This is not the final review unit; it is the source branch used to verify the full resolver stack in one place.

This source branch now matches the current resolver split-stack top (perf/pm-split-resolver-doc-cleanup), so the split PRs should compose back to this diff exactly.

Covers From #2948

  • ManifestProvider / ManifestJob boundary.
  • Resolver-owned BFS/cache/inflight/waiter state.
  • Provider jobs only perform concrete manifest work.
  • Demand-first prefetch scheduling.
  • Obsolete preload removal.
  • Speculative full-manifest version extraction.
  • Version manifest Vec hot path.
  • Ruborist global MemoryCache / PackageCache / registry OnceMap cleanup.
  • PM resolver wiring needed to preserve p1 behavior.

Actual Split Stack

Notes

#3043 is the remaining core scheduling PR. It is intentionally larger than the target review size because splitting semver and full-manifest demand paths creates a bad intermediate state: cache ownership and project-cache output would diverge by registry capability. Follow-up PRs keep tests, preload deletion, registry cleanup, and docs separate.

Validation

Validated at resolver split-stack top:

  • cargo fmt
  • cargo check -p utoo-ruborist
  • cargo test -p utoo-ruborist --lib (170 passed)
  • cargo clippy --all-targets -- -D warnings --no-deps

pack-napi warns locally because next.js is a symlink in this worktree; clippy exits successfully.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the dependency resolver from a two-phase approach to a demand-driven BFS resolution. It introduces a ManifestProvider trait and a centralized FetchQueues system to manage concurrent manifest jobs, moving in-memory caching directly into the resolver loop. Performance is further optimized by introducing multiple HTTP client pools and improving JSON parsing efficiency through speculative extraction. Feedback focuses on performance improvements within the new resolver logic, specifically addressing algorithmic complexity in job selection and prefetch tracking, as well as redundant cloning during cache warming. Additionally, it is recommended to replace recovery logic for unreachable states with panics to align with unrecoverable error guidelines.

Comment on lines +854 to +858
let prefetch_concurrency = if self
.queued
.values()
.any(|priority| *priority == FetchPriority::Demand)
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of self.queued.values().any(...) here is $O(N)$ where $N$ is the number of queued jobs. Since pop_next is called in a loop during BFS traversal, this can lead to $O(N^2)$ complexity relative to the level size.

Additionally, the variable name prefetch_concurrency shadows the function parameter, which can be confusing.

Consider using !self.demand.is_empty() as a fast $O(1)$ check. While it may occasionally be a false positive due to stale entries in the VecDeque, it is a safe approximation for limiting prefetch concurrency.

        let limit = if !self.demand.is_empty() {
            prefetch_concurrency
        } else {
            usize::MAX
        };

Comment on lines +1685 to +1699
let Some(done) = fetches.next().await else {
let mut fallback = Vec::new();
for (_, waiters) in state.full_waiters.drain() {
fallback.extend(waiters);
}
for (_, waiters) in state.version_waiters.drain() {
fallback.extend(waiters);
}
for (parent, edge) in fallback {
let processed = process_dependency(graph, registry, parent, &edge, config)
.await
.map_err(|inner| chain_err(graph, parent, &edge, inner))?;
handle_processed(graph, receiver, parent, &edge, &processed, &mut next_level);
}
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This fallback logic implements recovery for a state that should be unreachable. According to our guidelines, we should not implement recovery logic for bugs that represent unrecoverable states. Instead of a sequential fallback, this should trigger a panic to ensure the underlying logic error is identified and fixed.

        if result.is_none() && !waiters.is_empty() {
            panic!("Async fetcher returned None while waiters still exist. This is an unrecoverable logic error.");
        }
References
  1. Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors.

Comment on lines +887 to +892
fn active_prefetches(&self) -> usize {
self.active
.values()
.filter(|priority| **priority == FetchPriority::Prefetch)
.count()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The active_prefetches method performs an $O(M)$ scan of the active jobs map (where $M$ is the concurrency limit). Since this is called frequently in the main loop, it's better to maintain a counter in the FetchQueues struct.

Comment on lines +917 to +931
for (name, pkg_cache) in &warm.cache {
for (spec, version) in &pkg_cache.specs {
let Some(manifest) = pkg_cache.manifests.get(version) else {
continue;
};
let manifest = Arc::new(manifest.clone());
state
.version_cache
.insert((name.clone(), spec.clone()), Arc::clone(&manifest));
state
.version_cache
.entry((name.clone(), version.clone()))
.or_insert(manifest);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation clones the manifest for every spec that points to it, which is inefficient for packages with many specs resolving to the same version.

Consider iterating over the manifests first to populate the version cache, then mapping specs to the already created Arcs.

        for (name, pkg_cache) in &warm.cache {
            let mut version_to_arc = HashMap::new();
            for (version, manifest) in &pkg_cache.manifests {
                let arc = Arc::new(manifest.clone());
                version_to_arc.insert(version, Arc::clone(&arc));
                state.version_cache.insert((name.clone(), version.clone()), arc);
            }
            for (spec, version) in &pkg_cache.specs {
                if let Some(arc) = version_to_arc.get(version) {
                    state.version_cache.insert((name.clone(), spec.clone()), Arc::clone(arc));
                }
            }
        }

@elrrrrrrr elrrrrrrr force-pushed the perf/pm-source-resolver branch from 2702d34 to e0acfeb Compare May 21, 2026 17:40
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-source-resolver branch from 9b50ad8 to 9a0819b Compare May 21, 2026 22:32
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-source-resolver branch from 9a0819b to bb23373 Compare May 21, 2026 23:09
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-source-resolver branch from bb23373 to 711835d Compare May 21, 2026 23:39
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · f318a41 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 7.93s 0.36s 10.65s 6.29s 709M 312.5K
utoo-next 7.44s 0.70s 10.88s 7.87s 978M 117.4K
utoo-npm 7.47s 0.87s 11.13s 8.08s 1.01G 120.2K
utoo 7.85s 1.50s 11.73s 8.08s 969M 145.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 17.5K 18.9K 1.22G 6M 1.93G 1.81G 1M
utoo-next 129.7K 91.3K 1.19G 5M 1.77G 1.77G 2M
utoo-npm 146.2K 100.5K 1.19G 5M 1.77G 1.76G 2M
utoo 126.4K 64.7K 1.19G 6M 1.77G 1.77G 3M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 1.88s 0.02s 4.47s 0.81s 520M 168.5K
utoo-next 2.69s 0.03s 5.52s 1.19s 614M 83.5K
utoo-npm 2.86s 0.04s 5.75s 1.46s 635M 86.7K
utoo 2.26s 0.01s 6.34s 1.14s 643M 125.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 8.9K 5.2K 205M 3M 109M - 1M
utoo-next 54.3K 72.6K 202M 2M 7M 3M 2M
utoo-npm 79.6K 97.5K 203M 2M 7M 3M 2M
utoo 17.8K 19.1K 205M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 5.55s 0.04s 6.03s 6.20s 635M 207.0K
utoo-next 5.84s 1.76s 5.08s 7.01s 494M 60.0K
utoo-npm 5.20s 0.36s 5.06s 6.97s 522M 66.3K
utoo 5.19s 0.05s 5.03s 6.86s 478M 66.1K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 7.4K 6.8K 1.02G 4M 1.82G 1.82G 1M
utoo-next 123.5K 46.8K 1019M 3M 1.76G 1.76G 3M
utoo-npm 110.6K 48.5K 1018M 3M 1.76G 1.76G 3M
utoo 114.5K 49.7K 1018M 3M 1.76G 1.76G 3M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 2.25s 0.03s 0.15s 1.17s 136M 33.4K
utoo-next 1.78s 0.05s 0.50s 2.35s 80M 18.6K
utoo-npm 1.76s 0.05s 0.50s 2.31s 80M 18.2K
utoo 1.92s 0.14s 0.49s 2.34s 79M 18.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 405 12 5M 30K 1.98G 1.81G 1M
utoo-next 42.8K 16.9K 6K 6K 1.76G 1.76G 2M
utoo-npm 42.8K 18.0K 7K 5K 1.76G 1.76G 2M
utoo 42.3K 16.9K 4K 3K 1.77G 1.76G 2M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant