diff --git a/.gitignore b/.gitignore index 337a7c15450..9c46912406f 100644 --- a/.gitignore +++ b/.gitignore @@ -14,7 +14,7 @@ /tests/functional/lang/*.err /tests/functional/lang/*.ast -outputs/ +/outputs *~ diff --git a/doc/manual/source/SUMMARY.md.in b/doc/manual/source/SUMMARY.md.in index 0abe691cc25..914d63a0ed7 100644 --- a/doc/manual/source/SUMMARY.md.in +++ b/doc/manual/source/SUMMARY.md.in @@ -22,7 +22,10 @@ - [Store Object](store/store-object.md) - [Content-Addressing Store Objects](store/store-object/content-address.md) - [Store Path](store/store-path.md) - - [Store Derivation and Deriving Path](store/drv.md) + - [Store Derivation and Deriving Path](store/derivation/index.md) + - [Derivation Outputs and Types of Derivations](store/derivation/outputs/index.md) + - [Content-addressing derivation outputs](store/derivation/outputs/content-address.md) + - [Input-addressing derivation outputs](store/derivation/outputs/input-address.md) - [Building](store/building.md) - [Store Types](store/types/index.md) {{#include ./store/types/SUMMARY.md}} diff --git a/doc/manual/source/glossary.md b/doc/manual/source/glossary.md index a1964070588..db6d18f0efb 100644 --- a/doc/manual/source/glossary.md +++ b/doc/manual/source/glossary.md @@ -22,7 +22,7 @@ - [store derivation]{#gloss-store-derivation} A single build task. - See [Store Derivation](@docroot@/store/drv.md#store-derivation) for details. + See [Store Derivation](@docroot@/store/derivation/index.md#store-derivation) for details. [store derivation]: #gloss-store-derivation @@ -30,7 +30,7 @@ A [store path] which uniquely identifies a [store derivation]. - See [Referencing Store Derivations](@docroot@/store/drv.md#derivation-path) for details. + See [Referencing Store Derivations](@docroot@/store/derivation/index.md#derivation-path) for details. Not to be confused with [deriving path]. @@ -252,7 +252,7 @@ Deriving paths are a way to refer to [store objects][store object] that might not yet be [realised][realise]. - See [Deriving Path](./store/drv.md#deriving-path) for details. + See [Deriving Path](./store/derivation/index.md#deriving-path) for details. Not to be confused with [derivation path]. diff --git a/doc/manual/source/language/advanced-attributes.md b/doc/manual/source/language/advanced-attributes.md index c384e956af6..0722386c4cf 100644 --- a/doc/manual/source/language/advanced-attributes.md +++ b/doc/manual/source/language/advanced-attributes.md @@ -99,8 +99,8 @@ Derivations can declare some infrequently used optional attributes. to make it use the proxy server configuration specified by the user in the environment variables `http_proxy` and friends. - This attribute is only allowed in *fixed-output derivations* (see - below), where impurities such as these are okay since (the hash + This attribute is only allowed in [fixed-output derivations][fixed-output derivation], + where impurities such as these are okay since (the hash of) the output is known in advance. It is ignored for all other derivations. @@ -119,135 +119,6 @@ Derivations can declare some infrequently used optional attributes. [`impure-env`](@docroot@/command-ref/conf-file.md#conf-impure-env) configuration setting. - - [`outputHash`]{#adv-attr-outputHash}; [`outputHashAlgo`]{#adv-attr-outputHashAlgo}; [`outputHashMode`]{#adv-attr-outputHashMode}\ - These attributes declare that the derivation is a so-called *fixed-output derivation* (FOD), which means that a cryptographic hash of the output is already known in advance. - - As opposed to regular derivations, the [`builder`] executable of a fixed-output derivation has access to the network. - Nix computes a cryptographic hash of its output and compares that to the hash declared with these attributes. - If there is a mismatch, the derivation fails. - - The rationale for fixed-output derivations is derivations such as - those produced by the `fetchurl` function. This function downloads a - file from a given URL. To ensure that the downloaded file has not - been modified, the caller must also specify a cryptographic hash of - the file. For example, - - ```nix - fetchurl { - url = "http://ftp.gnu.org/pub/gnu/hello/hello-2.1.1.tar.gz"; - sha256 = "1md7jsfd8pa45z73bz1kszpp01yw6x5ljkjk2hx7wl800any6465"; - } - ``` - - It sometimes happens that the URL of the file changes, e.g., because - servers are reorganised or no longer available. We then must update - the call to `fetchurl`, e.g., - - ```nix - fetchurl { - url = "ftp://ftp.nluug.nl/pub/gnu/hello/hello-2.1.1.tar.gz"; - sha256 = "1md7jsfd8pa45z73bz1kszpp01yw6x5ljkjk2hx7wl800any6465"; - } - ``` - - If a `fetchurl` derivation was treated like a normal derivation, the - output paths of the derivation and *all derivations depending on it* - would change. For instance, if we were to change the URL of the - Glibc source distribution in Nixpkgs (a package on which almost all - other packages depend) massive rebuilds would be needed. This is - unfortunate for a change which we know cannot have a real effect as - it propagates upwards through the dependency graph. - - For fixed-output derivations, on the other hand, the name of the - output path only depends on the `outputHash*` and `name` attributes, - while all other attributes are ignored for the purpose of computing - the output path. (The `name` attribute is included because it is - part of the path.) - - As an example, here is the (simplified) Nix expression for - `fetchurl`: - - ```nix - { stdenv, curl }: # The curl program is used for downloading. - - { url, sha256 }: - - stdenv.mkDerivation { - name = baseNameOf (toString url); - builder = ./builder.sh; - buildInputs = [ curl ]; - - # This is a fixed-output derivation; the output must be a regular - # file with SHA256 hash sha256. - outputHashMode = "flat"; - outputHashAlgo = "sha256"; - outputHash = sha256; - - inherit url; - } - ``` - - The `outputHash` attribute must be a string containing the hash in either hexadecimal or "nix32" encoding, or following the format for integrity metadata as defined by [SRI](https://www.w3.org/TR/SRI/). - The "nix32" encoding is an adaptation of base-32 encoding. - The [`convertHash`](@docroot@/language/builtins.md#builtins-convertHash) function shows how to convert between different encodings, and the [`nix-hash` command](../command-ref/nix-hash.md) has information about obtaining the hash for some contents, as well as converting to and from encodings. - - The `outputHashAlgo` attribute specifies the hash algorithm used to compute the hash. - It can currently be `"blake3", "sha1"`, `"sha256"`, `"sha512"`, or `null`. - `outputHashAlgo` can only be `null` when `outputHash` follows the SRI format. - - The `outputHashMode` attribute determines how the hash is computed. - It must be one of the following values: - - - [`"flat"`](@docroot@/store/store-object/content-address.md#method-flat) - - This is the default. - - - [`"recursive"` or `"nar"`](@docroot@/store/store-object/content-address.md#method-nix-archive) - - > **Compatibility** - > - > `"recursive"` is the traditional way of indicating this, - > and is supported since 2005 (virtually the entire history of Nix). - > `"nar"` is more clear, and consistent with other parts of Nix (such as the CLI), - > however support for it is only added in Nix version 2.21. - - - [`"text"`](@docroot@/store/store-object/content-address.md#method-text) - - > **Warning** - > - > The use of this method for derivation outputs is part of the [`dynamic-derivations`][xp-feature-dynamic-derivations] experimental feature. - - - [`"git"`](@docroot@/store/store-object/content-address.md#method-git) - - > **Warning** - > - > This method is part of the [`git-hashing`][xp-feature-git-hashing] experimental feature. - - - [`__contentAddressed`]{#adv-attr-__contentAddressed} - - > **Warning** - > This attribute is part of an [experimental feature](@docroot@/development/experimental-features.md). - > - > To use this attribute, you must enable the - > [`ca-derivations`][xp-feature-ca-derivations] experimental feature. - > For example, in [nix.conf](../command-ref/conf-file.md) you could add: - > - > ``` - > extra-experimental-features = ca-derivations - > ``` - - If this attribute is set to `true`, then the derivation - outputs will be stored in a content-addressed location rather than the - traditional input-addressed one. - - Setting this attribute also requires setting - [`outputHashMode`](#adv-attr-outputHashMode) - and - [`outputHashAlgo`](#adv-attr-outputHashAlgo) - like for *fixed-output derivations* (see above). - - It also implicitly requires that the machine to build the derivation must have the `ca-derivations` [system feature](@docroot@/command-ref/conf-file.md#conf-system-features). - - [`passAsFile`]{#adv-attr-passAsFile}\ A list of names of attributes that should be passed via files rather than environment variables. For example, if you have @@ -370,6 +241,134 @@ Derivations can declare some infrequently used optional attributes. ensures that the derivation can only be built on a machine with the `kvm` feature. -[xp-feature-ca-derivations]: @docroot@/development/experimental-features.md#xp-feature-ca-derivations +## Setting the derivation type + +As discussed in [Derivation Outputs and Types of Derivations](@docroot@/store/derivation/outputs/index.md), there are multiples kinds of derivations / kinds of derivation outputs. +The choice of the following attributes determines which kind of derivation we are making. + +- [`__contentAddressed`] + +- [`outputHash`] + +- [`outputHashAlgo`] + +- [`outputHashMode`] + +The three types of derivations are chosen based on the following combinations of these attributes. +All other combinations are invalid. + +- [Input-addressing derivations](@docroot@/store/derivation/outputs/input-address.md) + + This is the default for `builtins.derivation`. + Nix only currently supports one kind of input-addressing, so no other information is needed. + + `__contentAddressed = false;` may also be included, but is not needed, and will trigger the experimental feature check. + +- [Fixed-output derivations][fixed-output derivation] + + All of [`outputHash`], [`outputHashAlgo`], and [`outputHashMode`]. + + + +- [(Floating) content-addressing derivations](@docroot@/store/derivation/outputs/content-address.md) + + Both [`outputHashAlgo`] and [`outputHashMode`], `__contentAddressed = true;`, and *not* `outputHash`. + + If an output hash was given, then the derivation output would be "fixed" not "floating". + +Here is more information on the `output*` attributes, and what values they may be set to: + + - [`outputHashMode`]{#adv-attr-outputHashMode} + + This specifies how the files of a content-addressing derivation output are digested to produce a content address. + + This works in conjunction with [`outputHashAlgo`](#adv-attr-outputHashAlgo). + Specifying one without the other is an error (unless [`outputHash` is also specified and includes its own hash algorithm as described below). + + The `outputHashMode` attribute determines how the hash is computed. + It must be one of the following values: + + - [`"flat"`](@docroot@/store/store-object/content-address.md#method-flat) + + This is the default. + + - [`"recursive"` or `"nar"`](@docroot@/store/store-object/content-address.md#method-nix-archive) + + > **Compatibility** + > + > `"recursive"` is the traditional way of indicating this, + > and is supported since 2005 (virtually the entire history of Nix). + > `"nar"` is more clear, and consistent with other parts of Nix (such as the CLI), + > however support for it is only added in Nix version 2.21. + + - [`"text"`](@docroot@/store/store-object/content-address.md#method-text) + + > **Warning** + > + > The use of this method for derivation outputs is part of the [`dynamic-derivations`][xp-feature-dynamic-derivations] experimental feature. + + - [`"git"`](@docroot@/store/store-object/content-address.md#method-git) + + > **Warning** + > + > This method is part of the [`git-hashing`][xp-feature-git-hashing] experimental feature. + + See [content-addressing store objects](@docroot@/store/store-object/content-address.md) for more information about the process this flag controls. + + - [`outputHashAlgo`]{#adv-attr-outputHashAlgo} + + This specifies the hash alorithm used to digest the [file system object] data of a content-addressing derivation output. + + This works in conjunction with [`outputHashMode`](#adv-attr-outputHashAlgo). + Specifying one without the other is an error (unless [`outputHash` is also specified and includes its own hash algorithm as described below). + + The `outputHashAlgo` attribute specifies the hash algorithm used to compute the hash. + It can currently be `"blake3"`, "sha1"`, `"sha256"`, `"sha512"`, or `null`. + + `outputHashAlgo` can only be `null` when `outputHash` follows the SRI format, because in that case the choice of hash algorithm is determined by `outputHash`. + + - [`outputHash`]{#adv-attr-outputHashAlgo}; [`outputHash`]{#adv-attr-outputHashMode}\ + + This will specify the output hash of the single output of a [fixed-output derivation]. + + The `outputHash` attribute must be a string containing the hash in either hexadecimal or "nix32" encoding, or following the format for integrity metadata as defined by [SRI](https://www.w3.org/TR/SRI/). + The "nix32" encoding is an adaptation of base-32 encoding. + + > **Note** + > + > The [`convertHash`](@docroot@/language/builtins.md#builtins-convertHash) function shows how to convert between different encodings. + > The [`nix-hash` command](../command-ref/nix-hash.md) has information about obtaining the hash for some contents, as well as converting to and from encodings. + + - [`__contentAddressed`]{#adv-attr-__contentAddressed} + + > **Warning** + > + > This attribute is part of an [experimental feature](@docroot@/development/experimental-features.md). + > + > To use this attribute, you must enable the + > [`ca-derivations`][xp-feature-ca-derivations] experimental feature. + > For example, in [nix.conf](../command-ref/conf-file.md) you could add: + > + > ``` + > extra-experimental-features = ca-derivations + > ``` + + This is a boolean with a default of `false`. + It determines whether the derivation is floating content-addressing. + +[`__contentAddressed`]: #adv-attr-__contentAddressed +[`outputHash`]: #adv-attr-outputHash +[`outputHashAlgo`]: #adv-attr-outputHashAlgo +[`outputHashMode`]: #adv-attr-outputHashMode + +[fixed-output derivation]: @docroot@/glossary.md#gloss-fixed-output-derivation +[file system object]: @docroot@/store/file-system-object.md +[store object]: @docroot@/store/store-object.md [xp-feature-dynamic-derivations]: @docroot@/development/experimental-features.md#xp-feature-dynamic-derivations [xp-feature-git-hashing]: @docroot@/development/experimental-features.md#xp-feature-git-hashing diff --git a/doc/manual/source/language/derivations.md b/doc/manual/source/language/derivations.md index 0f9284e9844..43eec680bbc 100644 --- a/doc/manual/source/language/derivations.md +++ b/doc/manual/source/language/derivations.md @@ -1,7 +1,7 @@ # Derivations The most important built-in function is `derivation`, which is used to describe a single store-layer [store derivation]. -Consult the [store chapter](@docroot@/store/drv.md) for what a store derivation is; +Consult the [store chapter](@docroot@/store/derivation/index.md) for what a store derivation is; this section just concerns how to create one from the Nix language. This builtin function takes as input an attribute set, the attributes of which specify the inputs to the process. @@ -16,7 +16,7 @@ It outputs an attribute set, and produces a [store derivation] as a side effect - [`name`]{#attr-name} ([String](@docroot@/language/types.md#type-string)) A symbolic name for the derivation. - See [derivation outputs](@docroot@/store/drv.md#outputs) for what this is affects. + See [derivation outputs](@docroot@/store/derivation/index.md#outputs) for what this is affects. [store path]: @docroot@/store/store-path.md @@ -34,7 +34,7 @@ It outputs an attribute set, and produces a [store derivation] as a side effect - [`system`]{#attr-system} ([String](@docroot@/language/types.md#type-string)) - See [system](@docroot@/store/drv.md#system). + See [system](@docroot@/store/derivation/index.md#system). > **Example** > @@ -64,7 +64,7 @@ It outputs an attribute set, and produces a [store derivation] as a side effect - [`builder`]{#attr-builder} ([Path](@docroot@/language/types.md#type-path) | [String](@docroot@/language/types.md#type-string)) - See [builder](@docroot@/store/drv.md#builder). + See [builder](@docroot@/store/derivation/index.md#builder). > **Example** > @@ -113,7 +113,7 @@ It outputs an attribute set, and produces a [store derivation] as a side effect Default: `[ ]` - See [args](@docroot@/store/drv.md#args). + See [args](@docroot@/store/derivation/index.md#args). > **Example** > diff --git a/doc/manual/source/store/building.md b/doc/manual/source/store/building.md index 79808273edc..feefa8e3fda 100644 --- a/doc/manual/source/store/building.md +++ b/doc/manual/source/store/building.md @@ -10,7 +10,7 @@ ## Builder Execution -The [`builder`](./drv.md#builder) is executed as follows: +The [`builder`](./derivation/index.md#builder) is executed as follows: - A temporary directory is created under the directory specified by `TMPDIR` (default `/tmp`) where the build will take place. The diff --git a/doc/manual/source/store/drv.md b/doc/manual/source/store/derivation/index.md similarity index 89% rename from doc/manual/source/store/drv.md rename to doc/manual/source/store/derivation/index.md index 83ca80aaabd..42cfa67f5b9 100644 --- a/doc/manual/source/store/drv.md +++ b/doc/manual/source/store/derivation/index.md @@ -9,15 +9,24 @@ This is where Nix distinguishes itself. ## Store Derivation {#store-derivation} -A derivation is a specification for running an executable on precisely defined input files to repeatably produce output files at uniquely determined file system paths. +A derivation is a specification for running an executable on precisely defined input to produce on more [store objects][store object]. +These store objects are known as the derivation's *outputs*. + +Derivations are *built*, in which case the process is spawned according to the spec, and when it exits, required to leave behind files which will (after post-processing) become the outputs of the derivation. +This process is described in detail in [Building](@docroot@/store/building.md). + + A derivation consists of: - A name - - A set of [*inputs*][inputs], a set of [deriving paths][deriving path] + - An [inputs specification][inputs], a set of [deriving paths][deriving path] - - A map of [*outputs*][outputs], from names to other data + - An [outputs specification][outputs], specifying which outputs should be produced, and various metadata about them. - The ["system" type][system] (e.g. `x86_64-linux`) where the executable is to run. @@ -26,8 +35,8 @@ A derivation consists of: [store derivation]: #store-derivation [inputs]: #inputs [input]: #inputs -[outputs]: #outputs -[output]: #outputs +[outputs]: ./outputs/index.md +[output]: ./outputs/index.md [process creation fields]: #process-creation-fields [builder]: #builder [args]: #args @@ -89,28 +98,6 @@ The [process creation fields] will presumably include many [store paths][store p But rather than somehow scanning all the other fields for inputs, Nix requires that all inputs be explicitly collected in the inputs field. It is instead the responsibility of the creator of a derivation (e.g. the evaluator) to ensure that every store object referenced in another field (e.g. referenced by store path) is included in this inputs field. -### Outputs {#outputs} - -The outputs are the derivations are the [store objects][store object] it is obligated to produce. - -Outputs are assigned names, and also consistent of other information based on the type of derivation. - -Output names can be any string which is also a valid [store path] name. -The store path of the output store object (also called an [output path] for short), has a name based on the derivation name and the output name. -In the general case, store paths have name `derivationName + "-" + outputName`. -However, an output named "out" has a store path with name is just the derivation name. -This is to allow derivations with a single output to avoid a superfluous `"-${outputName}"` in their single output's name when no disambiguation is needed. - -> **Example** -> -> A derivation is named `hello`, and has two outputs, `out`, and `dev` -> -> - The derivation's path will be: `/nix/store/-hello.drv`. -> -> - The store path of `out` will be: `/nix/store/-hello`. -> -> - The store path of `dev` will be: `/nix/store/-hello-dev`. - ### System {#system} The system type on which the [`builder`](#attr-builder) executable is meant to be run. diff --git a/doc/manual/source/store/derivation/outputs/content-address.md b/doc/manual/source/store/derivation/outputs/content-address.md new file mode 100644 index 00000000000..21e940bc2a8 --- /dev/null +++ b/doc/manual/source/store/derivation/outputs/content-address.md @@ -0,0 +1,192 @@ +# Content-addressing derivation outputs + +The content-addressing of an output only depends on that store object itself, not any other information external (such has how it was made, when it was made, etc.). +As a consequence, a store object will be content-addressed the same way regardless of whether it was manually inserted into the store, outputted by some derivation, or outputted by a some other derivation. + +The output spec for a content-addressed output must contains the following field: + +- *method*: how the data of the store object is digested into a content address + +The possible choices of *method* are described in the [section on content-addressing store objects](@docroot@/store/store-object/content-address.md). +Given the method, the output's name (computed from the derivation name and output spec mapping as described above), and the data of the store object, the output's store path will be computed as described in that section. + +## Fixed-output content-addressing {#fixed} + +In this case the content-address of the *fixed* in advanced by the derivation itself. +In other words, when the derivation has finished [building](@docroot@/store/building.md), and the provisional output' content-address is computed as part of the process to turn it into a *bona fide* store object, the calculated content address must much that given in the derivation, or the build of that derivation will be deemed a failure. + +The output spec for an output with a fixed content addresses additionally contains: + +- *hash*, the hash expected from digesting the store object's file system objects. + This hash may be of a freely-chosen hash algorithm (that Nix supports) + +> **Design note** +> +> In principle, the output spec could also specify the references the store object should have, since the references and file system objects are equally parts of a content-addressed store object proper that contribute to its content-addressed. +> However, at this time, the references are not not done because all fixed content-addressed outputs are required to have no references (including no self-reference). +> +> Also in principle, rather than specifying the references and file system object data with separate hashes, a single hash that constraints both could be used. +> This could be done with the final store path's digest, or better yet, the hash that will become the store path's digest before it is truncated. +> +> These possible future extensions are included to elucidate the core property of fixed-output content addressing --- that all parts of the output must be cryptographically fixed with one or more hashes --- separate from the particulars of the currently-supported store object content-addressing schemes. + +### Design rationale + +What is the purpose of fixing an output's content address in advanced? +In abstract terms, the answer is carefully controlled impurity. +Unlike a regular derivation, the [builder] executable of a derivation that produced fixed outputs has access to the network. +The outputs' guaranteed content-addresses are supposed to mitigate the risk of the builder being given these capabilities; +regardless of what the builder does *during* the build, it cannot influence downstream builds in unanticipated ways because all information it passed downstream flows through the outputs whose content-addresses are fixed. + +[builder]: @docroot@/store/derivation/index.md#builder + +In concrete terms, the purpose of this feature is fetching fixed input data like source code from the network. +For example, consider a family of "fetch URL" derivations. +These derivations download files from given URL. +To ensure that the downloaded file has not been modified, each derivation must also specify a cryptographic hash of the file. +For example, + +```jsonc +{ + "outputs: { + "out": { + "method": "nar", + "hashAlgo": "sha256", + "hash: "1md7jsfd8pa45z73bz1kszpp01yw6x5ljkjk2hx7wl800any6465", + }, + }, + "env": { + "url": "http://ftp.gnu.org/pub/gnu/hello/hello-2.1.1.tar.gz" + // ... + }, + // ... +} +``` + +It sometimes happens that the URL of the file changes, +e.g., because servers are reorganised or no longer available. +In these cases, we then must update the call to `fetchurl`, e.g., + +```diff + "env": { +- "url": "http://ftp.gnu.org/pub/gnu/hello/hello-2.1.1.tar.gz" ++ "url": "ftp://ftp.nluug.nl/pub/gnu/hello/hello-2.1.1.tar.gz" + // ... + }, +``` + +If a `fetchurl` derivation's outputs were [input-addressed][input addressing], the output paths of the derivation and of *all derivations depending on it* would change. +For instance, if we were to change the URL of the Glibc source distribution in Nixpkgs (a package on which almost all other packages depend on Linux) massive rebuilds would be needed. +This is unfortunate for a change which we know cannot have a real effect as it propagates upwards through the dependency graph. + +For content-addressed outputs (fixed or floating), on the other hand, the outputs' store path only depends on the derivation's name, data, and the `method` of the outputs' specs. +The rest of the derivation is ignored for the purpose of computing the output path. + +> **History Note** +> +> Fixed content-addressing is especially important both today and historically as the *only* form of content-addressing that is stabilized. +> This is why the rationale above contrasts it with [input addressing]. + +## (Floating) Content-Addressing {#floating} + +> **Warning** +> This is part of an [experimental feature](@docroot@/development/experimental-features.md). +> +> To use this type of output addressing, you must enable the +> [`ca-derivations`][xp-feature-ca-derivations] experimental feature. +> For example, in [nix.conf](@docroot@/command-ref/conf-file.md) you could add: +> +> ``` +> extra-experimental-features = ca-derivations +> ``` + +With this experimemental feature enabled, derivation outputs can also be content-addressed *without* fixing in the output spec what the outputs' content address must be. + +### Purity + +Because the derivation output is not fixed (just like with [input addressing]), the [builder] is not given any impure capabilities [^purity]. + +> **Configuration note** +> +> Strictly speaking, the extent to which sandboxing and deprivilaging is possible varies with the environment Nix is running in. +> Nix's configuration settings indicate what level of sandboxing is required or enabled. +> Builds of derivations will fail if they request an absense of sandboxing which is not allowed. +> Builds of derivations will also fail if the level of sandboxing specified in the configure exceeds what is possible in teh given environment. +> +> (The "environment", in this case, consists of attributes such as the Operating System Nix runs atop, along with the operating-system-specific privilages that Nix has been granted. +> Because of how conventional operating systems like macos, Linux, etc. work, granting builders *fewer* privilages may ironically require that Nix be run with *more* privilages.) + +That said, derivations producing floating content-addressed outputs may declare their builders as impure (like the builders of derivations producing producing fixed outputs). +This is provisionally supported as part of the [`impure-derivations`][xp-feature-impure-derivations] experimental feature. + +### Compatibility negotiation + +Any derivation producing a floating content-addresssed output implicitly requires the `ca-derivations` [system feature](@docroot@/command-ref/conf-file.md#conf-system-features). +This prevents scheduling the building of the derivation on a machine without the experimental feature enabled. +Even once the experimental feature is stabilized, this is still useful in order to be allow using remote builder running odler versions of Nix, or alternative implementations that do not support floating content addressing. + +### Determinism + +In the earlier [discussion of how self-references are handled when content-addressing store objects](@docroot@/store/store-object/content-address.html#self-references), it was pointed out that methods of producing store objects ought to be deterministic regardless of the choice of provisional store path. +For store objects produced by manually inserting into the store to create a store object, the "method of production" is an informally concept --- formally, Nix has no idea where the store object came from, and content-addressing is crucial in order to ensure that the derivation is *intrinsically* tamper-proof. +But for store objects produced by derivation, the "method is quite formal" --- the whole point of derivations is to be a formal notion of building, after all. +In this case, we can elevate this informal property to a formal one. + +A *determinstic* content-addressing derivation should produce outputs with the same content addresses: + +1. Every time the builder is run + + This is because either the builder is completely sandboxed, or because all any remaining impurities that leak inside the build sandbox are ignored by the builder and do not influence its behavior. + +2. Regardless of the choice of any provisional outputs paths + + Provisional store paths must be chosen for any output that has a self-reference. + The choice of provisional store path can be thought of as an impurity, since it is an arbitrary choice. + + If provisional outputs paths are deterministically chosen, we are in the first branch of part (1). + The builder the data it produces based on it in arbitrary ways, but this gets us closer to to [input addressing]. + Deterministically choosing the provisional path may be considered "complete sandboxing" by removing an impurity, but this is unsatisfactory + + + + If provisional outputs paths are randomly chosen, we are in the second branch of part (1). + The builder *must* not let the random input affect the final outputs it produces, and multiple builds may be performed and the compared in order to ensure that this is in fact the case. + +### Floating versus Fixed + +While the destinction between content- and input-addressing is one of *mechanism*, the distinction between fixed and floating content addression is more one of *policy*. +A fixed output that passes its content address check is just like a floating output. +It is only in the potential for that check to fail that they are different. + +> **Design Note** +> +> In a future world where floating content-addressing is also stable, we in principle no longer need separate [fixed](#fixed) content-addressing. +> Instead, we could always use floating content-addressing, and separately assert the precise value content address of a given store object to be used as an input (of another derivation). +> A stand-alone assertion object of this sort is not yet implemented, but its possible creation is tracked in [Issue #11955](https://github.com/NixOS/nix/issues/11955). +> +> In the current version of Nix, fixed outputs which fail their hash check are still registered as valid store objects, just not registered as outputs of the derivation which produced them. +> This is an optimization that means if the wrong output hash is specified in a derivation, and then the derivation is recreated with the right output hash, derivation does not need to be rebuilt --- avoiding downloading potentially large amounts of data twice. +> This optimisation prefigures the design above: +> If the output hash assertion was removed outside the derivation itself, Nix could additionally not only register that outputted store object like today, but could also make note that derivation did in fact successfully download some data. +For example, for the "fetch URL" example above, making such a note is tantamount to recording what data is available at the time of download at the given URL. +> It would only be when Nix subsequently tries to build something with that (refining our example) downloaded source code that Nix would be forced to check the output hash assertion, preventing it from e.g. building compromised malware. +> +> Recapping, Nix would +> +> 1. successfully download data +> 2. insert that data into the store +> 3. associate (presumably with some sort of expiration policy) the downloaded data with the derivation that downloaded it +> +> But only use the downloaded store object in subsequent derivations that depended upon the assertion if the assertion passed. +> +> This possible future extension is included to illustrate this distinction: + +[input addressing]: ./input-address.md +[xp-feature-ca-derivations]: @docroot@/development/experimental-features.md#xp-feature-ca-derivations +[xp-feature-git-hashing]: @docroot@/development/experimental-features.md#xp-feature-git-hashing +[xp-feature-impure-derivations]: @docroot@/development/experimental-features.md#xp-feature-impure-derivations diff --git a/doc/manual/source/store/derivation/outputs/index.md b/doc/manual/source/store/derivation/outputs/index.md new file mode 100644 index 00000000000..15070a18f05 --- /dev/null +++ b/doc/manual/source/store/derivation/outputs/index.md @@ -0,0 +1,97 @@ +# Derivation Outputs and Types of Derivations + +As stated on the [main pages on derivations](../index.md#store-derivation), +a derivation produces [store objects], which are known as the *outputs* of the derivation. +Indeed, the entire point of derivations is to produce these outputs, and to reliably and reproducably produce these derivations each time the derivation is run. + +One of the parts of a derivation is its *outputs specification*, which specifies certain information about the outputs the derivation produces when run. +The outputs specification is a map, from names to specifications for individual outputs. + +## Output Names {#outputs} + +Output names can be any string which is also a valid [store path] name. +The name mapped to each output specification is not actually the name of the output. +In the general case, the output store object has name `derivationName + "-" + outputSpecName`, not any other metadata about it. +However, an output spec named "out" describes and output store object whose name is just the derivation name. + +> **Example** +> +> A derivation is named `hello`, and has two outputs, `out`, and `dev` +> +> - The derivation's path will be: `/nix/store/-hello.drv`. +> +> - The store path of `out` will be: `/nix/store/-hello`. +> +> - The store path of `dev` will be: `/nix/store/-hello-dev`. + +The outputs are the derivations are the [store objects][store object] it is obligated to produce. + +> **Note** +> +> The formal terminology here is somewhat at adds with everyday communication in the Nix community today. +> "output" in casual usage tends to refer to either to the actual output store object, or the notional output spec, depending on context. +> +> For example "hello's `dev` output" means the store object referred to by the store path `/nix/store/-hello-dev`. +> It is unusual to call this the "`hello-dev` output", even though `hello-dev` is the actual name of that store object. + +## Types of output addressing + +The main information contained in an output specification is how the derivation output is addressed. +In particular, the specification decides: + +- whether the output is [content-addressed](./content-address.md) or [input-addressed](./input-address.md) + +- if the content is content-addressed, how is it content addressed + +- if the content is content-addressed, [what is its content address](./content-address.md#fixed-content-addressing) (and thus what is its [store path]) + +## Types of derivations + +The sections on each type of derivation output addressing ended up discussing other attributes of the derivation besides its outputs, such as purity, scheduling, determinism, etc. +This is no concidence; for the type of a derivation is in fact one-for-one with the type of its outputs: + +- A derivation that produces *xyz-addressed* outputs is an *xyz-addressing* derivations. + +The rules for this are fairly concise: + +- All the outputs must be of the same type / use the same addressing + + - The derivation must have at least one output + + - Additionally, if the outputs are fixed content-addressed, there must be exactly one output, whose specification is mapped from the name `out`. + (The name `out` is special, according to the rules described above. + Having only one output and calling its specification `out` means the single output is effectively anonymous; the store path just has the derivation name.) + + (This is an arbitrary restriction that could be lifted.) + +- The output is either *fixed* or *floating*, indicating whether the its store path is known prior to building it. + + - With fixed content-addressing it is fixed. + + > A *fixed content-addressing* derivation is also called a *fixed-output derivation*, since that is the only currently-implemented form of fixed-output addressing + + - With floating content-addressing or input-addressing it is floating. + + > Thus, historically with Nix, with no experimental features enabled, *all* outputs are fixed. + +- The derivation may be *pure* or *impure*, indicating what read access to the outside world the [builder](../index.md#builder) has. + + - An input-addressing derivation *must* be pure. + + > If it is impure, we would have a large problem, because an input-addressed derivation always produces outputs with the same paths. + + + - A content-addressing derivation may be pure or impure + + - If it is impure, it may be be fixed (typical), or it may be floating if the additional [`impure-derivations`][xp-feature-impure-derivations] experimental feature is enabled. + + - If it is pure, it must be floating. + + - Pure, fixed content-addressing derivations are not suppported + + > There is no use for this forth combination. + > The sole purpose of an output's store path being fixed is to support the derivation being impure. + +[xp-feature-ca-derivations]: @docroot@/development/experimental-features.md#xp-feature-ca-derivations +[xp-feature-git-hashing]: @docroot@/development/experimental-features.md#xp-feature-git-hashing +[xp-feature-impure-derivations]: @docroot@/development/experimental-features.md#xp-feature-impure-derivations diff --git a/doc/manual/source/store/derivation/outputs/input-address.md b/doc/manual/source/store/derivation/outputs/input-address.md new file mode 100644 index 00000000000..54d9437d9e1 --- /dev/null +++ b/doc/manual/source/store/derivation/outputs/input-address.md @@ -0,0 +1,31 @@ +# Input-addressing derivation outputs + +[input addressing]: #input-addressing + +"Input addressing" means the address the store object by the *way it was made* rather than *what it is*. +That is to say, an input-addressed output's store path is a function not of the output itself, but the derivation that produced it. +Even if two store paths have the same contents, if they are produced in different ways, and one is input-addressed, then they will have different store paths, and thus guaranteed to not be the same store object. + + + +[xp-feature-ca-derivations]: @docroot@/development/experimental-features.md#xp-feature-ca-derivations +[xp-feature-git-hashing]: @docroot@/development/experimental-features.md#xp-feature-git-hashing +[xp-feature-impure-derivations]: @docroot@/development/experimental-features.md#xp-feature-impure-derivations diff --git a/doc/manual/source/store/store-object/content-address.md b/doc/manual/source/store/store-object/content-address.md index 02dce283650..38a000d0460 100644 --- a/doc/manual/source/store/store-object/content-address.md +++ b/doc/manual/source/store/store-object/content-address.md @@ -24,13 +24,17 @@ For the full specification of the algorithms involved, see the [specification of ### File System Objects -With all currently supported store object content addressing methods, the file system object is always [content-addressed][fso-ca] first, and then that hash is incorporated into content address computation for the store object. +With all currently-supported store object content-addressing methods, the file system object is always [content-addressed][fso-ca] first, and then that hash is incorporated into content address computation for the store object. ### References +#### References to other store object#### References to other store objectss + With all currently supported store object content addressing methods, other objects are referred to by their regular (string-encoded-) [store paths][Store Path]. +#### Self-references + Self-references however cannot be referred to by their path, because we are in the midst of describing how to compute that path! > The alternative would require finding as hash function fixed point, i.e. the solution to an equation in the form @@ -40,7 +44,28 @@ Self-references however cannot be referred to by their path, because we are in t > which is computationally infeasible. > As far as we know, this is equivalent to finding a hash collision. -Instead we just have a "has self reference" boolean, which will end up affecting the digest. +Instead we have a "has self reference" boolean, which end up affecting the digest: +In all currently-supported store object content-addressing methods, when hashing the file system object data, any occurence of store objects own store path in the digested data is replaced with a [sentinal value](https://en.wikipedia.org/wiki/Sentinel_value). +The hashes of these modified input streams are used instead. + +When validating the content-address of a store object after the fact, the above process works as written. +However, when first creating the store object we don't know the store object's store path, as explained just above. +We therefore, strictly speaking, do not know what value we will be replacing with the sentinental value in the inputs to hash functions. +What instead happens is that the provisional store object --- the data from which we wish to create a store object --- is paired with a provisional "scratch" store path (that presumably was choosen when the data was created). +That provisional store path is instead what is replaced with the sentinal value, rather than the final store object which we do not yet know. + +> **Design note** +> +> It is an informal property of content-addressed store objects that the choice of provisional store path should not matter. +> In other words, if a provisional store object is prepared in the same way except for the choice of provision store path, the provisional data need not be identical. +> But, after the sentinal value is substituted in place of each provisional store object's provision store path, the final so-normalized data *should* be identifical. +> +> If, conversely, the data after this normalization process is still different, we'll compute a different content-address. +> The method of preparing the provisional self-referenced data has *failed* to be deterministic in the sense of not *leaking* the choice of provisional store path --- a choice which is supposed to be arbitrary --- into the final store object. +> +> This property is informal because at this stage, we are just described store objects, which have no formal notion of their origin. +> Without such a formal notion, there is nothing to formally accuse of being insufficiently deterministic. +> Later in this chapter, when we cover [derivations](@docroot@/store/derivation/index.md), we will have a chance to make this a formal property, not of content-addressed store objects themselves, but of derivations that *produce* content-addressed store objects. ### Name and Store Directory diff --git a/src/libexpr/primops.cc b/src/libexpr/primops.cc index 7c9ce71045d..8156d0320de 100644 --- a/src/libexpr/primops.cc +++ b/src/libexpr/primops.cc @@ -1595,7 +1595,7 @@ static RegisterPrimOp primop_placeholder({ .args = {"output"}, .doc = R"( Return at - [output placeholder string](@docroot@/store/drv.md#output-placeholder) + [output placeholder string](@docroot@/store/derivation/index.md#output-placeholder) for the specified *output* that will be substituted by the corresponding [output path](@docroot@/glossary.md#gloss-output-path) at build time. @@ -2139,7 +2139,7 @@ static RegisterPrimOp primop_outputOf({ .args = {"derivation-reference", "output-name"}, .doc = R"( Return the output path of a derivation, literally or using an - [input placeholder string](@docroot@/store/drv.md#input-placeholder) + [input placeholder string](@docroot@/store/derivation/index.md#input-placeholder) if needed. If the derivation has a statically-known output path (i.e. the derivation output is input-addressed, or fixed content-addresed), the output path will just be returned.