backend/go: deduplicate third-party module downloads across go.mods#23261
Open
rdeknijf wants to merge 2 commits intopantsbuild:mainfrom
Open
backend/go: deduplicate third-party module downloads across go.mods#23261rdeknijf wants to merge 2 commits intopantsbuild:mainfrom
rdeknijf wants to merge 2 commits intopantsbuild:mainfrom
Conversation
cad8d86 to
385bc39
Compare
Author
Replace the per-go.mod `AnalyzeThirdPartyModuleRequest` with a content-addressed `ModuleDownloadRequest` keyed on (name, version, minimum_go_version, build_opts, go_sum_entries). The Pants engine memoizes identical requests, so a module shared by N go.mods is downloaded and analyzed once instead of N times. A synthetic go.mod + go.sum pair is written into the download sandbox. The go.sum entries come from the consuming go.mod's real go.sum, keeping Go's checksum verification intact. When entries are absent (transitive modules not yet in go.sum), Go falls back to GOSUMDB. On a 3-go.mod reproducer, `pants list ::` peak memory drops from 91 GB to 32 GB (-65%) and wall time from 48s to 33s (-32%). A real 24-go.mod monorepo completes in 2.58 GB / 112s (previously OOM-killed). Partially addresses pantsbuild#20274. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
385bc39 to
f5ce74a
Compare
sureshjoshi
requested changes
Apr 21, 2026
Member
sureshjoshi
left a comment
There was a problem hiding this comment.
Thanks for the contribution. We've just branched for 2.32.x, so merging this pull request now will come out in 2.33.x, please move the release notes updates to docs/notes/2.33.x.md if that's appropriate.
The 2.32.x branch has already been cut, so this change will land in 2.33.x. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rdeknijf
commented
Apr 21, 2026
Author
rdeknijf
left a comment
There was a problem hiding this comment.
Done — moved the release notes entry to docs/notes/2.33.x.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Disclaimer
I'm only a casual Go developer and this is my first Pants work. But this problem has been blocking us for so long that I decided that I'd try to see how far I could come with the best AI I could gather. So it's me plus every model and role I could get my hands on, dredging through possible solutions and taking the simplest one. And then testing it to death, and having yet more models and roles review it. Everything below is Claude talking:
Summary
Third-party Go module analysis is now deduplicated across
go.modfiles.Previously, a module required by N
go.modfiles was downloaded andanalyzed N times, causing O(N*M) downloads and significant memory
overhead in monorepos with many overlapping
go.modfiles.Partially addresses #20274.
Motivation
Issue #20274 reports Pants' Go backend OOM-killing machines on monorepos
with multiple
go.modfiles. Root cause:AnalyzeThirdPartyModuleRequestincluded
go_mod_address,go_mod_digest, andgo_mod_pathin its key,so the engine treated the same module@version required by two
go.modfiles as two independent requests. A monorepo with 24go.modfiles sharing ~700 unique modules would download each shared module up
to 24 times.
Approach
AnalyzeThirdPartyModuleRequestwithModuleDownloadRequest,keyed only on content-addressable inputs:
(name, version, minimum_go_version, build_opts, go_sum_entries).download_and_analyze_modulerule builds a synthetic sandbox(
go.modrequiring just the one module, plus a syntheticgo.sumcarrying the consuming repo's real checksum lines for that module@version)
and runs
go mod download.the same module share a single download.
go.modanalyze_go_third_party_modulerule, its requesttype, and
_check_go_sum_has_not_changedhelper are removed.go.sumparsing is done once pergo.modinto a dict for O(1) lookupper module (was O(N*L) with per-module linear scans).
Security
go.sumentries for a givenmodule@versionare content-addressable bydefinition, so two well-formed consuming
go.mods must agree on them.Including
go_sum_entriesin the memoization key ensures:go.sumin the sandbox letsgo mod downloadrun its normal checksumverification (no
GONOSUMCHECKoverride anywhere).verified independently — the tampered one fails with Go's usual
SECURITY ERROR.go.sumlacks entries for atransitive module (discovered during MVS), the synthetic
go.sumisomitted and Go falls back to
GOSUMDB— matching effective pre-patchbehavior.
This was validated by three independent AI code reviews (security
engineer, supply chain specialist, Go module expert) plus mutation
testing, none of which found a security regression.
Benchmarks
Measured on a reproducer with 3
go.modfiles sharing grpc / protobuf /uber-fx / cloud.google.com/go (206 unique modules, 272 total dep entries).
Pants
main@2.32.0.dev7, isolated LMDB per run, peak RSS of theprocess tree sampled at 0.5 s:
list ::(3 go.mods)list mod-a::(1 go.mod)package ::(3 go.mods)package mod-a::(1 go.mod)Targets listed are byte-identical between before and after; all
packageinvocations produce the expected binaries.
A real 24-
go.modmonorepo completespants list ::with the patch in2.58 GB / 112 s / 9,729 targets. The unpatched run was not attempted
— it is the long-standing OOM that motivated the fix.
This is a no-op for repos with a single
go.mod.Tests
test_cross_go_mod_dedup_produces_identical_results— twogo.modfiles sharing a dep produce byte-identical analyses (including digest).
test_parse_go_sum— unit test for the go.sum parser: grouping,prefix safety (
v1.3must not matchv1.3.0), single-entry modules,CRLF handling, empty input.
test_extract_go_sum_entries_for_module— unit test for theper-module extraction wrapper.
test_invalid_go_sum(pre-existing) — tampered hashes trigger Go'sSECURITY ERRORthrough the syntheticgo.sumpath.Local run: 12 passed / 0 failed / 2 skipped (2 tests skipped are
pre-existing
@pytest.mark.skipfor#15824).Known limitation: the test sandbox has network access to GOSUMDB, so
test_invalid_go_sumcannot distinguish "go.sum verification failedlocally" from "GOSUMDB verification failed remotely." Mutation testing
confirmed this gap. The security property (synthetic go.sum is written
and used) is covered by the combination of the integration tests
(
test_invalid_go_sum+test_download_and_analyze_all_packages) plusthe
_parse_go_sum/_extract_go_sum_entries_for_moduleunit tests,but not by a single test in isolation.
Known trade-offs
minimum_go_versionfallback: when a module'sGoVersionisNone, the syntheticgo.modusesgo 1.21as a fallback. This isarbitrary; a more principled default (e.g., the module's own
godirective, or
1.17as the last major module semantics change) couldbe a follow-up. In practice
Noneis rare and thegodirective hasminimal effect on
go mod download -json module@version.go.sumdetection removed: the old_check_go_sum_has_not_changedwould warn whengo mod downloadadded entries to
go.sum. This check is removed because the syntheticsandbox cannot observe changes to the real
go.sum. The user'sgo.summust already be complete (enforceable bygo mod tidy).Release note
Added under
#### Goindocs/notes/2.32.x.md.Test plan
./pants test src/python/pants/backend/go/util_rules/third_party_pkg_test.pypasses./pants --changed-since=HEAD fmtcleanbuild-support/githooks/pre-pushcleango.modreproducer (listandpackage)go.modmonorepo completes successfullyAI disclosure
This change was developed with assistance from Claude (Anthropic). A
human reviewed every diff, authored the design decisions (synthetic
go.sumoverGONOSUMCHECK, dedup key composition, softgo.sumfallback), ran all benchmarks, and verified correctness. Independent code
reviews were solicited from Claude (Opus, Sonnet), OpenAI Codex
(GPT-5.4), and Google Gemini (2.5 Flash); their findings informed the
current state of the patch. Mutation testing was performed to validate
test coverage.