-
Notifications
You must be signed in to change notification settings - Fork 49
Roadmap for literals and initializers
On this page, we sketch how the syntax for literals and initializers can evolve over time. Originates from a design discussion in Dec 2021.
Initializers are a compact generalization of expressions, which is possible because the type of an initializer is known in advance. And we know method argument types in advance, so it should be OK to write:
method m(bytes_t data) { ... }
...
m({len, data});
We already have that assignment source is an initializer, so this would practically eliminate the need to add a syntax for struct literals.
Perhaps we can permit the syntax param list = {1, 2, 3}; for typed parameters.
cast({.x=3, .y=5}, some_struct_t)
C compound literals are the intuitive implementation of this -- however, they are dangerous as such a literal can appear in a DMLC-generated block that does not correspond to the block in DML in which the cast is used, which would give it unexpectedly short lifetime. This is especially dangerous with DML expressions translated to statement expressions. The probable best solution to this is to add support for 1. expressions corresponding to multiple C statements; and 2. add support for those statements to be inserted at points not corresponding to the expression. This would allow us to codegen a DML compound literal by declaring an (non user-visible) variable for it at the beginning of the C block corresponding to the DML block, and using that variable to represent the compound literal.
Using alloca to codegen the allocation for compound literals as a cheap means of side-stepping this problem is not an option as that would blow the stack if e.g. a compound literal is used within a loop.
With the addition of tuples as run-time values, the only thing that would prevent arbitrary method calls from being used as expressions would be throwing methods. This could be resolved through statement expressions or through support of expressions corresponding to multiple C statements.
The type of the expression a ? b : c is the common type of b and c. C has an extremely complex definition of this, today DML has a simpler but somewhat vague definition. We want to make this definition clearer; in particular we need a definition that guarantees that "common type" is an associative binary operator on types ("common type" would be a partial function, whose value would behave as the LUB of a semilattice). The idea is that this allows us to infer a base type from a list literal, e.g. given uint64 x, int64 y, the common type of 1, x, y is uint64, so the list literal [1,x,y] would evaluate to a list of uint64.
We want to change the syntax of compile-time heterogeneous lists, from today's [x,y,z] into either #[x,y,z] or #(x,y,z). Hopefully it can eventually be deprecated. We hope that homogeneous compile-time lists can be converted to run-time list literals.
New proposed syntax [x,y,z] for list literals. This is an expression. Unlike today's [x,y,z], the expression would evaluate to a value. The value would have an intrinsic type, probably defined as an array of N non-const elements. The type is visible if indexing the literal directly, which also is an important use case: register r[i<4] @ [1, 2, 3, 5][i]. (Technically, the type is also visible if assigned to a void *, although that would be bad practice).
An open question is if list literals should have the special property that they allow implicit conversion to arrays of different but similar types, e.g. int64[4] to uint8[4]. Probably possible, but the value is unclear since it's mostly useful in contexts where we want to allow initializers -- and in those contexts [...] would be list initializer syntax and the element type is unambiguous.
With [...] as a syntax for list literals, it would make sense to permit the same syntax also for array initializers. We can eventually deprecate the {...} syntax for array initializers. List initializers could also be used for other list-like types, like the planned vector types.
- The tuple literal
(1, 2)is a value of type(int64, int64), which is a tuple type. - An initializer can have the form
(initializer1, initializer2), which is a tuple initializer. This requires that we either have a single target of matching tuple type, or a tuple of targets matchinginitializer1andinitializer2. - Tuple values are structurally typed.
The syntaxes local (int a, int b) = some_tuple; and (a, b, c) = some_tuple; are both permitted to deconstruct tuple values. The latter syntax does not permit assignment chains. some_tuple may either be an expression of tuple type, or a tuple initializer.
[1: 2, 3: 4] or ["foo": 5, "bar": 6].
The key type is possibly restricted to strings and uint64. The dictionary's value type is the common type of the types of values.
[] as an initializer works fine both for arrays and dictionaries.
[] as an expression is a special value that supports the operations of dictionaries and lists, and produces the same result as an empty dictionary or list would produce.
The overloaded nature of []/() as syntax for literals, initializers, and -- in the case of () -- deconstruction patterns requires the grammar to be tailored accordingly. The following are pseudorules demonstrating how the grammar can be written to accommodate this.
// All normal expressions
expression_except_collection_literal <- ...
expression_except_tuple_literal <- expression_except_collection_literal | list_literal | dict_literal
expression <- expression_except_tuple_literal | tuple_literal
// Rules for assignment chains
assign_chain <- expression_except_tuple_literal assign_chain
assign_chain <- expression_except_tuple_literal EQUALS initializer
// Assignment not using tuple deconstruction
assign_stmt <- assign_chain
// Assignment using tuple deconstruction
assign_stmt <- tuple_literal EQUALS initializer
// scalar initializer
initializer <- expression_except_collection_literal
// other forms of initializers
initializer <- tuple_initializer | list_initializer | dict_initializer | struct_initializer