Skip to content

Commit 1aba5b6

Browse files
committed
designs/3264-embed: initial commit
Initial commit of the embed proposal destined for https://cuelang.org/issue/3264. Signed-off-by: Paul Jolly <[email protected]> Change-Id: If96e82ca68ff25db59d032c27d770c70acf2c285 Reviewed-on: https://review.gerrithub.io/c/cue-lang/proposal/+/1197180 Reviewed-by: Roger Peppe <[email protected]> Reviewed-by: Marcel van Lohuizen <[email protected]> TryBot-Result: CUEcueckoo <[email protected]>
1 parent fce83a6 commit 1aba5b6

File tree

1 file changed

+192
-0
lines changed

1 file changed

+192
-0
lines changed

designs/3264-embed.md

+192
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Objective / Abstract
2+
3+
We propose a solution for directly loading files of any type as part of CUE
4+
evaluation.
5+
6+
# Background
7+
8+
Users frequently need to load JSON, YAML, or other types of files into their CUE
9+
code. As CUE only supports `import` declarations that reference CUE packages,
10+
users currently resort to the CUE tooling layer (`cue cmd`) to load non-CUE
11+
files, which can be overly complex for their needs. The tooling layer was
12+
introduced to handle external influences that make a configuration non-hermetic,
13+
typically files.
14+
15+
However, files that are part of a CUE module can be considered hermetic. We
16+
aim to make it easier to reference these files.
17+
18+
# Overview / Proposal
19+
20+
We propose the `@embed` attribute for embedding.
21+
22+
```
23+
@extern(embed) // Enable processing of embedding.
24+
25+
package foo
26+
27+
// Load a single JSON file
28+
a: _ @embed(file=foo.json)
29+
30+
// Load all files with a name containing a dot (".") in the images directory
31+
// as binary files.
32+
b: _ @embed(glob=images/*.*, type=binary)
33+
b: [string]: bytes
34+
35+
// Unusual file names may be quoted to prevent
36+
// misinterpretation.
37+
c: _ @embed(file="a file.json")
38+
```
39+
40+
Key aspects:
41+
42+
- Embedding must be enabled by a file-level `@extern(embed)` attribute. This
43+
allows for quick identification of the use of embeddings by tooling.
44+
- Embedded files can be resolved and interpreted at load time, before
45+
evaluation: it is a syntactic operation.
46+
- The `@embed` attribute can use a file argument for a single file and a glob
47+
argument for multiple files.
48+
- By default, files are decoded using the encoding implied by the
49+
file name extension. It's an error if the extension is not known.
50+
This can be overridden using `type=$filetype`, where `$filetypes` can be
51+
any file type described in `cue help filetypes`.
52+
- For glob, if the extension is not given, the `type` field is required.
53+
54+
55+
# Detailed Design
56+
57+
## Embedding variants
58+
59+
When an embed attribute refers to a file, the file path is interpreted relative
60+
to the directory containing the embed attribute and may not include ‘.’ or ‘..’
61+
or empty path elements. It is not possible to embed a file that is outside the
62+
containing module.
63+
64+
File paths must be `/`-separated, even if CUE is used on Windows or other OS
65+
that does not use `/`-separated paths.
66+
67+
Multiple `@embed` attributes may be associated with the same field, in which
68+
case each of the respective values are unified.
69+
70+
### `@embed(file=$filepath)`
71+
72+
Specifies a single file to be loaded. The encoding of the file is determined by
73+
the file extension unless overridden by type.
74+
75+
It is an error if the file does not exist.
76+
77+
### `@embed(glob=$pattern)`
78+
79+
An embed attribute with the glob argument embeds any matching file as a map from
80+
file path to embedded file. The `$pattern` is matched according to the syntax
81+
used with [`path.Match`](https://pkg.go.dev/cuelang.org/go/pkg/path#Match).
82+
83+
All files must be of the same type, as identified by the extension. In case the
84+
extension is not fully specified (for example `@embed(glob=file.json*)`), the
85+
type needs to be explicitly specified.
86+
87+
We currently do not support `**` to allow selecting files in arbitrary
88+
subdirectories. To allow for this in the future, we do not allow `**` to appear
89+
in the glob.
90+
91+
Files starting with a ‘.’ are not included. We could later add an option to
92+
allow including those.
93+
94+
## File types
95+
96+
File types, when not derived from the file extension, are indicated with the
97+
`type` argument. The values this argument can take follow that of the `cue help
98+
filetypes`. In summary, a type can specify the encoding, interpretation, or
99+
both.
100+
101+
Initially we will not support the CUE filetype. Support for the `cue+data` file
102+
type, or more generally self-contained CUE files, could be added at a later
103+
date.
104+
105+
We will also not support [`.jsonl`](https://jsonlines.org/) or multi-doc `.yaml`
106+
file types initially. Instead these files can be embedded as `type=text` and
107+
decoded via `encoding/json` and `encoding/yaml`.
108+
109+
Unlike the command line, `@embed` does not automatically detect the
110+
interpretation based on the contents. For instance, to interpret a JSON file as
111+
OpenAPI, `openapi` needs to be explicitly in the `type` argument.
112+
113+
Just as on the command line, if the extension neither reflects the encoding nor
114+
the interpretation, they can both be specified in type, such as in
115+
`type=openapi+yaml`.
116+
117+
The interpretation of `type` is already internally implemented in the
118+
[`internal/filetypes`](https://pkg.go.dev/cuelang.org/go/internal/filetypes)
119+
package. This could be exposed via a non-internal package.
120+
121+
In the future we could consider an auto-detect option as is available in the
122+
command line.
123+
124+
We will not initially support schema-guided decoders, such as text protocol
125+
buffer values, as part of the `@embed` mechanism. In these cases, users will
126+
have to load the files as text and use CUE builtin and CUE evaluation to decode
127+
the embedded files. Using the
128+
[`ExternInterpreter`](https://pkg.go.dev/cuelang.org/go/cue/cuecontext#ExternInterpreter)
129+
infrastructure, we are at least prepared for such a change in the future.
130+
131+
## Build information
132+
133+
We propose listing files that are selected for embedding in the
134+
[`cue/build.Instance.EmbedFiles`](https://pkg.go.dev/cuelang.org/go/cue/build#Instance)
135+
field.
136+
137+
## Implementation
138+
139+
The embedding proposal can use the `internal/filetypes` and `internal/encoding`
140+
packages to compute the parameters of the decoding. We should investigate if we
141+
can reuse the `runtime.Interpreter` implementation for processing the
142+
attributes, as it is quite similar, though different, to how the `@extern`
143+
attribute is processed.
144+
145+
# Other Considerations
146+
147+
## Only support bytes for now
148+
149+
We wanted to see if we could support a simpler approach that only supports bytes
150+
and force users to convert bytes to the format they want. However, most of the
151+
converter packages assume UTF-8. This is fine to assume for strings within CUE,
152+
but like package [`cue/load`](https://pkg.go.dev/cuelang.org/go/cue/load), it
153+
should not be assumed when loading files.
154+
155+
We could still support only loading bytes if we ensure that all encoder
156+
functionality properly handles BOMs. We may still want to do that regardless
157+
eventually.
158+
159+
## Supporting `**` in `glob` patterns
160+
161+
We currently do not support `**` in a `glob` pattern to match arbitrary
162+
subdirectories. If we do adopt such a feature in the future, we will likely use
163+
the syntax adopted as part of the [LSP
164+
specification](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.18/specification/#documentFilter),
165+
here `**` is used to match any number of path segments, including none.
166+
167+
## Parent directories
168+
169+
Map keys generated for the glob option are files relative to the directory
170+
containing the `@embed` attribute. We could, instead, create path keys relative
171+
to a module root. This would make it possible to embed files from parent
172+
directories (as long as they are within the same module). We could make this an
173+
option later on and denote such paths starting them with `/` to represent the
174+
module root.
175+
176+
## Security
177+
178+
Embedding is always enabled and may pull in files that end up being exposed in a
179+
configuration.
180+
181+
The restrictions that disallow embedding files from parent directories, and that
182+
limit any embedding to files within the containing CUE module, preclude the
183+
loading of sensitive files from random places on disk.
184+
185+
A CUE module's
186+
[`source.kind`](https://cuelang.org/docs/reference/modules/#determining-zip-file-contents)
187+
ensures that the contents of a published module correspond to a VCS commit.
188+
Assuming that sensitive files are not included as part of a VCS commit, this
189+
ensures that a published CUE module will also not contain sensitive files.
190+
191+
It is ultimately, however, the responsibility of the module author to ensure
192+
that sensitive files are not accidentally included.

0 commit comments

Comments
 (0)