metadata resolve workstream

# What's the problem this feature will solve?

The 2020 resolver separated the resolution logic from the rest of pip, and made the resolver much easier to read and extend. With `--use-feature=fast-deps`, we began to investigate improved pip performance by avoiding the download of dependencies until we reach a complete resolution. With `install --report`, we enabled pip users to read the output of the resolver without needing to download dependencies. However, there remain a few blockers to achieving performance improvements, for multiple use cases:

## Uncached resolves: 
When pip is executed entirely from scratch (without an existing `~/.cache/pip` directory), as is often the case in CI, we are unlikely to get too much faster than we are now **(and notably, it's extremely unlikely a non-pip tool could go faster in this case without relying on some sort of remote resolve index)**. However, there are a couple improvements we can still make here:

- Make `--use-feature=fast-deps` default, to cover for wheels without [PEP 658](https://peps.python.org/pep-0658/) metadata backfilled yet.
    - Importantly, this is often the case for internal corporate CI indices, which are unlikely to have performed the process of backfilling PEP 658 metadata, and may even be a `--find-links` repo instead of serving the PyPI simple repository API.
    - *`fast-deps` is not as fast as it could be, which will be addressed further below.* 
- Establish a separate metadata cache directory, so that CI runs in e.g. github actions can retain the result of metadata resolution run-over-run, without having to store large binary wheel output.
    - *We will discuss metadata caching further below--this currently does not exist in pip.*
    - This should achieve similar performance to *partially-cached resolves*, discussed next.
- Batch downloading of wheels all at once after a shallow metadata resolve.
    - Pip already has the infrastructure to do this, it's just not implemented yet!

## Partially-cached resolves with downloading
When pip *is* executed with a persistent `~/.cache/pip` directory, we can take advantage of much more caching, and this is the bulk of the work here. In e.g. #11111 and other work, we (mostly) separated metadata resolution from downloading, and this has allowed us to consider how to cache not just downloaded artifacts, but parts of the resolution process itself. This is directly enabled by the clean separation and design of the 2020 resolver. We can cache the following:

- PEP 658 metadata for a particular wheel (this saves us a small download).
- `fast-deps` metadata for a particular wheel (this saves us a few HTTP range requests).
- Metadata extracted from an sdist (this saves us a medium-size download and a build process).

These alone may not seem like much, but over the course of an entire resolve, not having to make potentially multiple network requests per dependency and staying within our in-memory pip resolve logic adds up and produces a very significant performance improvement. These also reduce the number of requests we make against PyPI.

**But wait! We can go even faster!** Because in addition to the metadata cache (which is idempotent and not time-varying--the same wheel hash always maps to the same metadata), we can *also* cache the result of querying the simple repository API for the list of dists available for a given dependency name! This additional caching requires messing around with HTTP caching headers to see if a given page has changed, but it lets us cache:

- The raw content of simple repository index page, if unchanged (this saves us a medium-size download).
- The result of parsing a simple repository index page into `Link`s, if unchanged (this saves us an HTML parser invocation).
- The result of filtering `Link`s by interpreter compatibility, if unchanged (this saves us having to calculate interpreter compatibility using the tags logic).

## Resolves without downloading
With `install --report out.json --dry-run` and the metadata resolve+caching discussed above, we *should* be able to avoid downloading the files associated with the resolved dists, enabling users to download those dependencies in a later phase (as I achieved at Twitter with https://github.com/pantsbuild/pants/pull/8793). However, we currently don't do so (see #11512), because of a mistake I made in previous implementation work (sorry!). So for this, we just need:

- Avoid downloading dists when invoked from a command which only requests metadata (such as `install --dry-run`).

# Describe the solution you'd like

I have created several PRs which achieve all of the above:

## Batch downloading [0/2]
For batch downloading of metadata-only dists, we have two phases:
- [ ] #12925 produces extremely rich progress output for batched downloads.
- [ ] #12923 finally makes the `BatchDownloader` download and prepare metadata-only dists in parallel. **This produces a drastic performance improvement.**

## `fast-deps` fixes [0/1]
- [ ] #12208 fixes the `fast-deps` implementation to achieve excellent performance against the current iteration of PyPI behind fastly, as well as any other HTTP host supporting range requests.

## Formalize "concrete" vs metadata-only dists [0/3]
To avoid downloading dists for metadata-only commands, we have several phases:
- [ ] #12863 introduces `.is_concrete` to our `Distribution` wrappers to codify the concept of "metadata-only" dists.
- [ ] #12871 is a refactoring change to decouple the caching of metadata-only dists from the rest of the `RequirementPreparer` logic.
- [ ] #12186 finally fixes #11512, so `install --dry-run` doesn't download any dists.

## Metadata caching [0/1]
- [ ] #12256 introduces the first "metadata cache", which is separate from and much smaller than the cache used for downloaded or built wheels. **This produces the most drastic performance improvement to the resolve process.**

## Caching index pages [0/2]
To optimize the process of obtaining `Link`s to resolve against, we have at least two phases:
- [ ] #12257 attaches HTTP caching headers to requests against index pages, which allows our `CacheControl` to implicitly retrieve the cached response after a very fast `304 Not Modified` from PyPI.
- [ ] #12258 checks whether the index page was updated (e.g. by a new upload) since last downloaded, and if not, retrieves the result of `Link` parsing and interpreter compatibility filtering from the metadata cache.

Each of these PRs demonstrate some nontrivial performance improvement in their description. All together, the result is quite significant, and never produces a slowdown.

# Alternative Solutions

- None of these changes modify any external APIs. They do introduce new subdirectories to `~/.cache/pip`, which can be tracked separately from the wheel cache to speed up resolution without ballooning in size.
- I expect the `Link` parsing and interpreter compatibility caching in #12258 to involve more discussion, as they take up more cache space than the idempotent metadata cache, produce less of a performance improvement, and are more complex to implement. However, nothing else depends on them to work, and they can safely be discussed later after the preceding caching work is done.

# Additional context

In writing this, I realized we may be able to modify the approach of #12257 to work with `--find-links` repos as well. Those are expected to change much more frequently than index pages using the simple repository API, but may be worth considering after the other work is done.

### Code of Conduct

- [X] I agree to follow the [PSF Code of Conduct](https://www.python.org/psf/conduct/).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metadata resolve workstream #12921

What's the problem this feature will solve?

Uncached resolves:

Partially-cached resolves with downloading

Resolves without downloading

Describe the solution you'd like

Batch downloading [0/2]

`fast-deps` fixes [0/1]

Formalize "concrete" vs metadata-only dists [0/3]

Metadata caching [0/1]

Caching index pages [0/2]

Alternative Solutions

Additional context

Code of Conduct

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

metadata resolve workstream #12921

Description

What's the problem this feature will solve?

Uncached resolves:

Partially-cached resolves with downloading

Resolves without downloading

Describe the solution you'd like

Batch downloading [0/2]

fast-deps fixes [0/1]

Formalize "concrete" vs metadata-only dists [0/3]

Metadata caching [0/1]

Caching index pages [0/2]

Alternative Solutions

Additional context

Code of Conduct

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`fast-deps` fixes [0/1]