Skip to content

Roadmap for literals and initializers

Erik Carstensen edited this page Mar 18, 2022 · 18 revisions

On this page, we sketch how the syntax for literals and initializers can evolve over time. Originates from a design discussion in Dec 2021.

Parse method args as initializers

Initializers are a compact generalization of expressions, which is possible because the type of an initializer is known in advance. And we know method argument types in advance, so it should be OK to write:

method m(bytes_t data) { ... }
...
m({len, data});

We already have that assignment source is an initializer, so this would practically eliminate the need to add a syntax for struct literals.

Param values as initializers

Perhaps we can permit the syntax param list = {1, 2, 3}; for typed parameters.

Cast operand as initializer (compound literals)

cast({.x=3, .y=5}, some_struct_t)

C compound literals are the intuitive implementation of this -- however, they are dangerous as such a literal can appear in a DMLC-generated block that does not correspond to the block in DML in which the cast is used, which would give it unexpectedly short lifetime. This is especially dangerous with DML expressions translated to statement expressions. The probable best solution to this is to add support for 1. expressions corresponding to multiple C statements; and 2. add support for those statements to be inserted at points not corresponding to the expression. This would allow us to codegen a DML compound literal by declaring an (non user-visible) variable for it at the beginning of the C block corresponding to the DML block, and using that variable to represent the compound literal.

Using alloca to codegen the allocation for compound literals as a cheap means of side-stepping this problem is not an option as that would blow the stack if e.g. a compound literal is used within a loop.

One option is to control the lifespan of compound literals using malloc/free when needed, but this is expensive, and correct insertion of free calls probably requires some new compiler mechanics.

Another option is to initially forbid uses of compound literals that require a lifespan, such as taking the address of a compound literal. I.e., reduce compound literals to be rvalues. This restriction can be lifted if we move to an LLVM based back-end.

Throwing method calls

With the addition of tuples as run-time values, the only thing that would prevent arbitrary method calls from being used as expressions would be throwing methods. This could be resolved through statement expressions or through support of expressions corresponding to multiple C statements.

Common type

The type of the expression a ? b : c is the common type of b and c. C has an extremely complex definition of this, today DML has a simpler but somewhat vague definition. We want to make this definition clearer; in particular we need a definition that guarantees that "common type" is an associative binary operator on types ("common type" would be a partial function, whose value would behave as the LUB of a semilattice). The idea is that this allows us to infer a base type from a list literal, e.g. given uint64 x, int64 y, the common type of 1, x, y is uint64, so the list literal [1,x,y] would evaluate to a list of uint64.

Compile-time lists

We want to change the syntax of compile-time heterogeneous lists, from today's [x,y,z] into #[x,y,z]. Hopefully it can eventually be deprecated. We hope that homogeneous compile-time lists can be converted to run-time list literals.

List literals

New proposed syntax [x,y,z] for list literals. This is an expression. Unlike today's [x,y,z], the expression would evaluate to a value. The value would have an intrinsic type, probably defined as an array of N non-const elements. The type is visible if indexing the literal directly, which also is an important use case: register r[i<4] @ [1, 2, 3, 5][i]. (Technically, the type is also visible if assigned to a void *, although that would be bad practice).

An open question is if list literals should have the special property that they allow implicit conversion to arrays of different but similar types, e.g. int64[4] to uint8[4]. Probably possible, but the value is unclear since it's mostly useful in contexts where we want to allow initializers -- and in those contexts [...] would be list initializer syntax and the element type is unambiguous.

List literals are constant if all values are constant. Indexing a constant list literal with a constant evaluates to a constant value.

Migration considerations

Changing [] to literals would be problematic for compatibility: Today there is code with heterogeneous lists or lists of non-values; this is not valid for a list literal.

Initially we can fill this gap with irregular semantics, e.g. [a,b] is evaluated to a list literal if a and b are values that have a common type, and is re-interpreted as #[a,b] otherwise. We can slowly deprecate the latter case.

List initializers

With [...] as a syntax for list literals, it would make sense to permit the same syntax also for array initializers. We can eventually deprecate the {...} syntax for array initializers. List initializers could also be used for other list-like types, like the planned vector types.

Tuple types, literals and initializers

  • The tuple literal (1, 2) is a value of type (int64, int64), which is a tuple type.
  • An initializer can have the form (initializer1, initializer2), which is a tuple initializer. This requires that we either have a single target of matching tuple type, or a tuple of targets matching initializer1 and initializer2.
  • Tuple values are structurally typed.

Tuple deconstruction

The syntaxes local (int a, int b) = some_tuple; and (a, b, c) = some_tuple; are both permitted to deconstruct tuple values. The latter syntax does not permit assignment chains. some_tuple may either be an expression of tuple type, or a tuple initializer.

Dictionary literals and initializers

[1: 2, 3: 4] or ["foo": 5, "bar": 6].

The key type is possibly restricted to strings and uint64. The dictionary's value type is the common type of the types of values.

Empty list literal

[] as an initializer works fine both for arrays and dictionaries.

[] as an expression is a special value that supports the operations of dictionaries and lists, and produces the same result as an empty dictionary or list would produce.

Grammar

The overloaded nature of []/() as syntax for literals, initializers, and -- in the case of () -- deconstruction patterns requires the grammar to be tailored accordingly. The following are pseudorules demonstrating how the grammar can be written to accommodate this.

// All normal expressions
expression_except_collection_literal <- ...

expression_except_tuple_literal <- expression_except_collection_literal | list_literal | dict_literal

expression <- expression_except_tuple_literal | tuple_literal

// Rules for assignment chains
assign_chain <- expression_except_tuple_literal assign_chain
assign_chain <- expression_except_tuple_literal EQUALS initializer

// Assignment not using tuple deconstruction
assign_stmt <- assign_chain

// Assignment using tuple deconstruction
assign_stmt <- tuple_literal EQUALS initializer

// scalar initializer
initializer <- expression_except_collection_literal

// other forms of initializers
initializer <- tuple_initializer | list_initializer | dict_initializer | struct_initializer
Clone this wiki locally