Date: May 18, 2025 Status: Implemented
This doc describes the design and behavior of collection literals in Mojo. It is
intended to explain how types can work with collection literals and how corner
cases are handled in one end-to-end place. It does not go into the design
details of specific collections like List or Dict.
Mojo inherits ergonomic collection literals from Python, for example, we want things like this to work as a python programmer would expect:
mylist = [1, 2, 3]
assert(mylist[1] == 2)
myset = {1, 2, 3}
assert(2 in myset)
mydict = {"foo": 1, "bar": 2}
assert(mydict["foo"] == 1)Mojo has List and Set and Dict collection types, but we need to wire up
the collection literal syntax to work with these properly.
While we want collection literals to default to the standard library types, we also want collection literals to work with other types as well. This is because we want to enable specialized collections, e.g. lists that contain elements inline, dictionaries specialized for string keys, and we also want to interoperate well with Python.
fn take_inline_array(arr: InlineArray[Int, 18]): ...
fn take_python(obj: PythonObject): ...
fn example():
# This should form an InlineArray directly. It shouldn't create a List and
# copy it.
take_inline_array([1, 2, 3])
# This should pass a python list with python integers, string, set and dict
# inside of it.
take_python([1, "str", {2, 3, 4}, {"foo": 1, "bar": 2}])This gives us the power of easy of use of ergonomic defaults, while also enabling the construction of efficient datatypes and interoperability with existing APIs.
If Mojo knows the contextual type for a collection literal, it will use that
type to construct the literal. Otherwise it will default to the corresponding
standard library type defined in the collections package: List, Set, Dict
etc.
In order to support list literal syntax, a type implements an initializer that
takes a __list_literal__: () argument - an argument of empty-tuple type. The
empty tuple ensures that this overload won't be picked accidentally, because
some types may need to support multiple literals (e.g. PythonObject needs to
support set literals and list literals, and the initializers do different
things. For example, List has this initializer:
struct List[T: Copyable & Movable, hint_trivial_type: Bool = False](...):
fn __init__(out self, owned *values: T, __list_literal__: () = ()):
"""Constructs a list from the given values.
...This allows List to be built from a list literal, and because the
__list_literal__ has a default value, you can also use the same initializer
with explicit syntax like List[Int](1, 2, 3) if you'd like.
While it is conventional to define this initializer with a variadic, this isn't required:
# A type that allows initializers with 2 or 3 integers only.
struct TwoAndThreeList:
fn __init__(out self, a: Int, b: Int, __list_literal__: ()): pass
fn __init__(out self, a: Int, b: Int, c: Int, __list_literal__: ()): pass
fn test():
# These are ok.
var a: TwoAndThreeList = [1, 2]
var b: TwoAndThreeList = [1, 2, 3]
# Error: no matching function in initialization
var c: TwoAndThreeList = [1, 2, 3, 4]Similarly, you can also define things that take heterogenous types:
struct StringIntListLiteral:
fn __init__(out self, a: String, b: Int, __list_literal__: ()): pass
fn test():
# Weird but accepted.
var a: StringIntListLiteral = ["foo", 2]While this is possible, this isn't something you should actually do. The reason
that this is important to support is PythonObject which takes a heterogenous
collection of values that conform to PythonConvertible.
Set literals work the same way as list literals, but choose an initializer with
the __set_literal__ keyword argument.
Like other forms of literals, a type enables support for Dictionary literals by
implementing a constructor. Here is Dict for example:
struct Dict[K: KeyElement, V: Copyable & Movable](...):
fn __init__(
out self,
owned keys: List[K],
owned values: List[V],
__dict_literal__: (),
): ...As with other literal types, the __dict_literal__ argument avoids ambiguity
with other constructors. The constructor takes a list of keys and a list of
values. These are passed separately (rather than as a list of tuples) because
this enables more flexibility in type merging for slightly different types,
e.g. we want {k1: 4.0, k2: 5} to produce a value of type Float64 even though
the literals have different types.
The actual type of the keys and values lists are flexible: the compiler
passes the keys and values as a list literal, so all the rules for list literals
apply to these arguments.
Mojo supports a syntax extension vs Python to express C++-style "initializer lists". This is a list of argument values that is bound up into an expression which is applied to a contextual type. This can be useful if you have a complicated type that you don't want to spell.
fn foo(x: SomeComplicatedType): ...
# Example with normal initializer.
foo(SomeComplicatedType(1, kwarg=42))
# Example with initializer list.
foo({1, kwarg=42})While this is a minor convenience in this case, it can be more significant when
working with more complex types that have lots of parameters that would
otherwise require complicated uses of __type_of(x).
The above approach has an ambiguity: does {a, b} create a set of two elements
with T.__init__(a, b, __set_literal__=()) or does it invoke an initializer
with just T.__init__(a, b)?
The approach used by the Mojo compiler is as follows based on whether it has a contextually inferred type or not. If not, the compiler assumes that the literal must be a Set literal. It:
- requires at least one element, rejecting
var x = {}because it cannot infer the element type ofSet. - rejects attempts to use keyword arguments, e.g.
var x = {1, kwarg=2}is known to not be a set, but we cannot know what type to create. - It unifies the elements provided (using
__merge_with__and implicit conversions using the same algorithm as list literals), and then creates an instance of theSettype.
If there is a known contextual type, the compiler uses the following approach:
- If the initializer list is empty, and if the type allows construction from a
dictionary literal, that constructor is used. This ensures that
{}turns into a dict withPythonObject. - If the type supports the set literal initializer, and if no keyword arguments are used, the element types are unified and the set literal constructor is used. Note that this disables initializer-list syntax for set-like types, but you can work around this by calling the initializer explicitly, or providing a keyword argument.
- Otherwise, the elements are passed in to the
T.__init__method as an initializer list.
Mojo supports Python-style collection "comprehensions", which are a collection literal with a single expression, followed by one or more postfix generator clauses indicating how to produce the elements. Some examples:
var list1 = [x*y for x in range(10) for y in other_int_list]
var list2: YourList = [x for x in range(10) if x & 1]
var set = [x for x in range(10)]
var dict = {x : x*x for x in range(10) if x & 1}Collection comprehensions desugar into the same code you'd get by writing nested
for loops and if statements, and uses the same defaulting rules as other
literals above. For example, the examples above desugar into:
var list1 = List[Int]()
for x in range(10):
for y in other_int_list:
list1.append(x*y)
var list2 = YourList()
for x in range(10):
if x & 1:
list2.append(x)
var set = Set[Int]()
for x in range(10):
set.add(x)
var dict = Dict[Int, Int]()
for x in range(10):
if x & 1:
dict[x] = x*xThis implies that this will only work with collection types that are default
constructible, and have the insertion methods corresponding to their type:
append for arrays, add for sets, and __setitem__ for dicts. Mojo doesn't
require the collections to be copy or movable or impose any other requirements
on them.