Description
High-level description
(From #4493)
When building a CA derivation drv0, the first thing we do (once we tried and failed substituting it) is to resolve it, which requires fetching/building all its inputs.
However, it might be that the remote cache knows about resolved(drv0), in which case we can substitute it, and all the inputs that we've downloaded are never used (except for the ones which are in the runtime closure ofc).
But we can only know that once we've resolved the derivation. And to resolve the derivation we need to know the output mappings for all its inputs, which currently requires fetching them.
Low-level description
Shallow realizations map a basic drv (no inputDrvs
) and output name to a content address.
Suppose we have a dependency graph like CompilerA
-> CompilerB
-> Library
. These are only build-time dependencies: the outputs of each build will not depend on this dependency. For sake of argument, CompilerA
is "plain old data", (like a bootstrap binary), and just uploaded as-is.
Suppose we have built all 2 derivations and uploaded the results, shallow realisations, but not deep realisations to a remote store.
Now, in another store, configured to substitute from that remote store, one tries to build Library
.
Currently, this will happen:
- Want to obtain
Library
- There is no deep realization in the cache keyed unresolved derivation
- We don't know any content-addressed store object we try to download.
- Wants to build
Library
- Want to obtain
CompilerB
- Finds shallow trace for
CompilerB
(sinceCompilerA
is plain old data,CompilerB
's derivation is already resolved) - Downloads
CompilerB
- Resolves
Library
derivation - Finds shallow trace for
Library
Derivation - Downloads
Library
This works, but note that we downloaded CompilerB
even though it is not in the runtime closure of Library
.
Instead I would want something like this:
- Want to obtain
Library
- There is no deep realization in the cache keyed unresolved derivation
- Wants to resolve
Library
derivation - Wants resolution for
CompilerB
- Finds shallow trace for
CompilerB
- Resolves
Library
derivation - Finds shallow trace for
Library
Derivation - Downloads
Library
Now we don't bother downloading CompilerB
.
The way to make the second sequence of steps reality is to have "obtaining a realisation" a goal in and of itself, separate from obtaining a store object and building one. In the case where the cache doesn't have the realisation, it falls back on to just building it, but in the case where it does it doesn't need to fall back on downloading store objects. Dependencies between these goals would allow us to resolve derivations through arbitrary many inputDrv
edges without downloading any store objects.
Before doing this, we should attempt #11927 so this code is not nearly as annoying to work with.