Skip to content

Stop collecting links for non-working deduping - Reduce peak memory by 50% during long resolves#13843

Open
notatallshaw wants to merge 4 commits intopypa:mainfrom
notatallshaw:stop-collecting-links
Open

Stop collecting links for non-working deduping - Reduce peak memory by 50% during long resolves#13843
notatallshaw wants to merge 4 commits intopypa:mainfrom
notatallshaw:stop-collecting-links

Conversation

@notatallshaw
Copy link
Member

@notatallshaw notatallshaw commented Mar 7, 2026

Fixes #12834

_logged_links stored (Link, LinkType, str) tuples to deduplicate "Skipping link" debug messages. Because each Link hashes by URL, every entry was unique the dedupe never fired, the set just accumulated Link references, preventing GC of anything the Link object was referencing.

As this never worked I'm just removing it, keeping only a set[str] for Requires-Python skip reasons (the only data read back from the set).

Using the following large resolve as a test:

pip install --dry-run apache-airflow[amazon,celery,cncf-kubernetes,docker,elasticsearch,google,mysql,postgres,redis,slack,snowflake,ssh]==3.0.6 --uploaded-prior-to 2026-01-01T00:00:00Z

~120k Link objects were not stored in the set, and peak memory went down from ~350 MiB to ~180 MiB.

@notatallshaw notatallshaw added this to the 26.1 milestone Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pip memory usage for large cached install dominated by list of candiate pages

1 participant