Description
In #10691 (fixed by #10693) we changed the behavior on a remote Maven 429 Too Many Requests: instead of only logging it at DEBUG and silently skipping the dependency, Trivy now stops the scan with a concrete, actionable error.
However, in some cases — large multi-module Maven projects — the user does not see that 429 error but a generic context canceled instead:
FATAL Fatal error run error: fs scan error: scan error: scan failed: failed analysis:
analyze with traversal: walk dir error: unknown error with docs/maven-plugin/pom.xml:
failed to analyze file: analyze file (docs/maven-plugin/pom.xml): semaphore acquire:
context canceled
The actual cause — the remote 429 Too Many Requests — is only visible at --debug:
[pom] Failed to fetch url="https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-all/3.0.25/groovy-all-3.0.25.pom" err="Get: context canceled"
So for multi-module projects the improvement from #10693 is effectively lost: the user gets context canceled instead of the actionable rate-limit message.
Reproduction
git clone https://github.com/keycloak/keycloak
cd keycloak
trivy fs --scanners vuln .
Reproduces when Maven Central rate-limits the IP (the repo has many pom.xml files, so the walk is still running when the 429 fires). The failing file in the error varies between runs (e.g. docs/maven-plugin/pom.xml), since it depends on timing.
Expected behavior
The scan stops with the actionable rate-limit error introduced in #10693:
FATAL Error remote Maven repository returned 429 Too Many Requests for https://repo.maven.apache.org/maven2/.../*.pom. Retry-After: <n>.
The repository blocks all subsequent requests from this IP until the block clears.
To avoid this, populate the local Maven cache before scanning (e.g. run `mvn dependency:resolve` and cache ~/.m2 in CI).
Actual behavior
The real *types.UserError (429) is masked by a generic semaphore acquire: context canceled.
Root cause
Artifact analysis runs file analyzers through errgroup.WithContext, while the synchronous file walk uses the group's egCtx for limit.Acquire. When the pom analyzer goroutine returns the 429 *types.UserError, errgroup cancels egCtx; the still-running walk then fails its next Acquire with context.Canceled, and that walk error is returned before eg.Wait() — which is where the real error lives.
The masking has been latent since the sync.WaitGroup → errgroup.WithContext migration (#9538) and became observable once #10693 made the pom analyzer return a fatal *types.UserError. The same pattern exists in the local (fs), image and vm artifacts. A single-pom.xml project surfaces the 429 correctly because the walk finishes before the cancellation.
Version
Description
In #10691 (fixed by #10693) we changed the behavior on a remote Maven
429 Too Many Requests: instead of only logging it atDEBUGand silently skipping the dependency, Trivy now stops the scan with a concrete, actionable error.However, in some cases — large multi-module Maven projects — the user does not see that
429error but a genericcontext canceledinstead:The actual cause — the remote
429 Too Many Requests— is only visible at--debug:So for multi-module projects the improvement from #10693 is effectively lost: the user gets
context canceledinstead of the actionable rate-limit message.Reproduction
Reproduces when Maven Central rate-limits the IP (the repo has many
pom.xmlfiles, so the walk is still running when the 429 fires). The failing file in the error varies between runs (e.g.docs/maven-plugin/pom.xml), since it depends on timing.Expected behavior
The scan stops with the actionable rate-limit error introduced in #10693:
Actual behavior
The real
*types.UserError(429) is masked by a genericsemaphore acquire: context canceled.Root cause
Artifact analysis runs file analyzers through
errgroup.WithContext, while the synchronous file walk uses the group'segCtxforlimit.Acquire. When thepomanalyzer goroutine returns the 429*types.UserError,errgroupcancelsegCtx; the still-running walk then fails its nextAcquirewithcontext.Canceled, and that walk error is returned beforeeg.Wait()— which is where the real error lives.The masking has been latent since the
sync.WaitGroup→errgroup.WithContextmigration (#9538) and became observable once #10693 made thepomanalyzer return a fatal*types.UserError. The same pattern exists in thelocal(fs),imageandvmartifacts. A single-pom.xmlproject surfaces the 429 correctly because the walk finishes before the cancellation.Version