Skip to content

fix(pm): drain pipeline worker tasks#2844

Closed
killagu wants to merge 1 commit into
nextfrom
agent/egg-dev/36136b86
Closed

fix(pm): drain pipeline worker tasks#2844
killagu wants to merge 1 commit into
nextfrom
agent/egg-dev/36136b86

Conversation

@killagu
Copy link
Copy Markdown
Contributor

@killagu killagu commented Apr 27, 2026

Summary

  • Track pipeline download/clone subtasks with JoinSet and drain them before worker completion.
  • Bound pipeline task fanout by the existing manifest concurrency setting.
  • Avoid waiting forever for local/workspace parent clone keys that are never registered.

Verification

  • cargo fmt --check
  • cargo test -p utoo-pm
  • cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Note: full cargo clippy --all-targets -- -D warnings --no-deps is blocked in this environment because openssl-sys requires pkg-config/OpenSSL for non-PM workspace targets.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces concurrency limits to the download and clone pipeline workers by utilizing tokio::task::JoinSet and a configurable max_in_flight limit. It also adds a wait_existing_if_pending method to OnceMap to ensure that child package clones do not hang when waiting for local or workspace parents that are not in the clone cache. Review feedback suggests improving error handling by propagating panics in the join_next helper instead of silently logging them, and logging failures in the download_to_cache task to improve observability.

Comment on lines +23 to +29
async fn join_next(join_set: &mut JoinSet<()>, worker_name: &str) {
if let Some(result) = join_set.join_next().await
&& let Err(e) = result
{
tracing::debug!("{worker_name} task failed: {e}");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of join_next logs all task failures, including panics, at the debug level and continues. This effectively implements recovery logic for panics, which violates the general rule that panics should be treated as unrecoverable bugs. If a subtask panics, it usually indicates a serious issue that should not be silently ignored or recovered from in a background worker.

Consider checking if the error is a panic and propagating it to ensure the worker fails as expected.

async fn join_next(join_set: &mut JoinSet<()>, worker_name: &str) {
    if let Some(Err(e)) = join_set.join_next().await {
        if e.is_panic() {
            std::panic::resume_unwind(e.into_panic());
        } else {
            tracing::debug!("{worker_name} task failed: {e}");
        }
    }
}
References
  1. Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors.

Comment on lines +60 to 62
tasks.spawn(async move {
download_to_cache(&name, &version, &tarball_url).await;
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The result of download_to_cache is currently ignored. If a download fails, the error is lost, and the pipeline continues silently. While the install phase might retry, it's better to log the failure in the pipeline for observability, consistent with how clone failures are handled in the clone worker.

            tasks.spawn(async move {
                if let Err(e) = download_to_cache(&name, &version, &tarball_url).await {
                    tracing::debug!("Pipeline download failed for {name}@{version}: {e:#}");
                }
            });

@elrrrrrrr
Copy link
Copy Markdown
Contributor

Closing as stale: this draft is a one-off agent experiment from 2026-04-27 with no follow-up, and overlaps with sibling PRs exploring the same optimization. Reopen if revisited.

@elrrrrrrr elrrrrrrr closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants