|
| 1 | +# Objective / Abstract |
| 2 | + |
| 3 | +We propose a solution for directly loading files of any type as part of CUE |
| 4 | +evaluation. |
| 5 | + |
| 6 | +# Background |
| 7 | + |
| 8 | +Users frequently need to load JSON, YAML, or other types of files into their CUE |
| 9 | +code. As CUE only supports `import` declarations that reference CUE packages, |
| 10 | +users currently resort to the CUE tooling layer (`cue cmd`) to load non-CUE |
| 11 | +files, which can be overly complex for their needs. The tooling layer was |
| 12 | +introduced to handle external influences that make a configuration non-hermetic, |
| 13 | +typically files. |
| 14 | + |
| 15 | +However, files that are part of a CUE module can be considered hermetic. We |
| 16 | +aim to make it easier to reference these files. |
| 17 | + |
| 18 | +# Overview / Proposal |
| 19 | + |
| 20 | +We propose the `@embed` attribute for embedding. |
| 21 | + |
| 22 | +``` |
| 23 | +@extern(embed) // Enable processing of embedding. |
| 24 | +
|
| 25 | +package foo |
| 26 | +
|
| 27 | +// Load a single JSON file |
| 28 | +a: _ @embed(file=foo.json) |
| 29 | +
|
| 30 | +// Load all files with a name containing a dot (".") in the images directory |
| 31 | +// as binary files. |
| 32 | +b: _ @embed(glob=images/*.*, type=binary) |
| 33 | +b: [string]: bytes |
| 34 | +
|
| 35 | +// Unusual file names may be quoted to prevent |
| 36 | +// misinterpretation. |
| 37 | +c: _ @embed(file="a file.json") |
| 38 | +``` |
| 39 | + |
| 40 | +Key aspects: |
| 41 | + |
| 42 | +- Embedding must be enabled by a file-level `@extern(embed)` attribute. This |
| 43 | + allows for quick identification of the use of embeddings by tooling. |
| 44 | +- Embedded files can be resolved and interpreted at load time, before |
| 45 | + evaluation: it is a syntactic operation. |
| 46 | +- The `@embed` attribute can use a file argument for a single file and a glob |
| 47 | + argument for multiple files. |
| 48 | +- By default, files are decoded using the encoding implied by the |
| 49 | +file name extension. It's an error if the extension is not known. |
| 50 | + This can be overridden using `type=$filetype`, where `$filetypes` can be |
| 51 | + any file type described in `cue help filetypes`. |
| 52 | +- For glob, if the extension is not given, the `type` field is required. |
| 53 | + |
| 54 | + |
| 55 | +# Detailed Design |
| 56 | + |
| 57 | +## Embedding variants |
| 58 | + |
| 59 | +When an embed attribute refers to a file, the file path is interpreted relative |
| 60 | +to the directory containing the embed attribute and may not include ‘.’ or ‘..’ |
| 61 | +or empty path elements. It is not possible to embed a file that is outside the |
| 62 | +containing module. |
| 63 | + |
| 64 | +File paths must be `/`-separated, even if CUE is used on Windows or other OS |
| 65 | +that does not use `/`-separated paths. |
| 66 | + |
| 67 | +Multiple `@embed` attributes may be associated with the same field, in which |
| 68 | +case each of the respective values are unified. |
| 69 | + |
| 70 | +### `@embed(file=$filepath)` |
| 71 | + |
| 72 | +Specifies a single file to be loaded. The encoding of the file is determined by |
| 73 | +the file extension unless overridden by type. |
| 74 | + |
| 75 | +It is an error if the file does not exist. |
| 76 | + |
| 77 | +### `@embed(glob=$pattern)` |
| 78 | + |
| 79 | +An embed attribute with the glob argument embeds any matching file as a map from |
| 80 | +file path to embedded file. The `$pattern` is matched according to the syntax |
| 81 | +used with [`path.Match`](https://pkg.go.dev/cuelang.org/go/pkg/path#Match). |
| 82 | + |
| 83 | +All files must be of the same type, as identified by the extension. In case the |
| 84 | +extension is not fully specified (for example `@embed(glob=file.json*)`), the |
| 85 | +type needs to be explicitly specified. |
| 86 | + |
| 87 | +We currently do not support `**` to allow selecting files in arbitrary |
| 88 | +subdirectories. To allow for this in the future, we do not allow `**` to appear |
| 89 | +in the glob. |
| 90 | + |
| 91 | +Files starting with a ‘.’ are not included. We could later add an option to |
| 92 | +allow including those. |
| 93 | + |
| 94 | +## File types |
| 95 | + |
| 96 | +File types, when not derived from the file extension, are indicated with the |
| 97 | +`type` argument. The values this argument can take follow that of the `cue help |
| 98 | +filetypes`. In summary, a type can specify the encoding, interpretation, or |
| 99 | +both. |
| 100 | + |
| 101 | +Initially we will not support the CUE filetype. Support for the `cue+data` file |
| 102 | +type, or more generally self-contained CUE files, could be added at a later |
| 103 | +date. |
| 104 | + |
| 105 | +We will also not support [`.jsonl`](https://jsonlines.org/) or multi-doc `.yaml` |
| 106 | +file types initially. Instead these files can be embedded as `type=text` and |
| 107 | +decoded via `encoding/json` and `encoding/yaml`. |
| 108 | + |
| 109 | +Unlike the command line, `@embed` does not automatically detect the |
| 110 | +interpretation based on the contents. For instance, to interpret a JSON file as |
| 111 | +OpenAPI, `openapi` needs to be explicitly in the `type` argument. |
| 112 | + |
| 113 | +Just as on the command line, if the extension neither reflects the encoding nor |
| 114 | +the interpretation, they can both be specified in type, such as in |
| 115 | +`type=openapi+yaml`. |
| 116 | + |
| 117 | +The interpretation of `type` is already internally implemented in the |
| 118 | +[`internal/filetypes`](https://pkg.go.dev/cuelang.org/go/internal/filetypes) |
| 119 | +package. This could be exposed via a non-internal package. |
| 120 | + |
| 121 | +In the future we could consider an auto-detect option as is available in the |
| 122 | +command line. |
| 123 | + |
| 124 | +We will not initially support schema-guided decoders, such as text protocol |
| 125 | +buffer values, as part of the `@embed` mechanism. In these cases, users will |
| 126 | +have to load the files as text and use CUE builtin and CUE evaluation to decode |
| 127 | +the embedded files. Using the |
| 128 | +[`ExternInterpreter`](https://pkg.go.dev/cuelang.org/go/cue/cuecontext#ExternInterpreter) |
| 129 | +infrastructure, we are at least prepared for such a change in the future. |
| 130 | + |
| 131 | +## Build information |
| 132 | + |
| 133 | +We propose listing files that are selected for embedding in the |
| 134 | +[`cue/build.Instance.EmbedFiles`](https://pkg.go.dev/cuelang.org/go/cue/build#Instance) |
| 135 | +field. |
| 136 | + |
| 137 | +## Implementation |
| 138 | + |
| 139 | +The embedding proposal can use the `internal/filetypes` and `internal/encoding` |
| 140 | +packages to compute the parameters of the decoding. We should investigate if we |
| 141 | +can reuse the `runtime.Interpreter` implementation for processing the |
| 142 | +attributes, as it is quite similar, though different, to how the `@extern` |
| 143 | +attribute is processed. |
| 144 | + |
| 145 | +# Other Considerations |
| 146 | + |
| 147 | +## Only support bytes for now |
| 148 | + |
| 149 | +We wanted to see if we could support a simpler approach that only supports bytes |
| 150 | +and force users to convert bytes to the format they want. However, most of the |
| 151 | +converter packages assume UTF-8. This is fine to assume for strings within CUE, |
| 152 | +but like package [`cue/load`](https://pkg.go.dev/cuelang.org/go/cue/load), it |
| 153 | +should not be assumed when loading files. |
| 154 | + |
| 155 | +We could still support only loading bytes if we ensure that all encoder |
| 156 | +functionality properly handles BOMs. We may still want to do that regardless |
| 157 | +eventually. |
| 158 | + |
| 159 | +## Supporting `**` in `glob` patterns |
| 160 | + |
| 161 | +We currently do not support `**` in a `glob` pattern to match arbitrary |
| 162 | +subdirectories. If we do adopt such a feature in the future, we will likely use |
| 163 | +the syntax adopted as part of the [LSP |
| 164 | +specification](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.18/specification/#documentFilter), |
| 165 | +here `**` is used to match any number of path segments, including none. |
| 166 | + |
| 167 | +## Parent directories |
| 168 | + |
| 169 | +Map keys generated for the glob option are files relative to the directory |
| 170 | +containing the `@embed` attribute. We could, instead, create path keys relative |
| 171 | +to a module root. This would make it possible to embed files from parent |
| 172 | +directories (as long as they are within the same module). We could make this an |
| 173 | +option later on and denote such paths starting them with `/` to represent the |
| 174 | +module root. |
| 175 | + |
| 176 | +## Security |
| 177 | + |
| 178 | +Embedding is always enabled and may pull in files that end up being exposed in a |
| 179 | +configuration. |
| 180 | + |
| 181 | +The restrictions that disallow embedding files from parent directories, and that |
| 182 | +limit any embedding to files within the containing CUE module, preclude the |
| 183 | +loading of sensitive files from random places on disk. |
| 184 | + |
| 185 | +A CUE module's |
| 186 | +[`source.kind`](https://cuelang.org/docs/reference/modules/#determining-zip-file-contents) |
| 187 | +ensures that the contents of a published module correspond to a VCS commit. |
| 188 | +Assuming that sensitive files are not included as part of a VCS commit, this |
| 189 | +ensures that a published CUE module will also not contain sensitive files. |
| 190 | + |
| 191 | +It is ultimately, however, the responsibility of the module author to ensure |
| 192 | +that sensitive files are not accidentally included. |
0 commit comments