Skip to content

Commit 471cc7e

Browse files
Merge pull request #65 from lucioleKi/eep73
Zip generator for comprehensions
2 parents c5096d5 + 4fa5b72 commit 471cc7e

File tree

1 file changed

+235
-0
lines changed

1 file changed

+235
-0
lines changed

eeps/eep-0073.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
Author: Isabell Huang <isabell(at)erlang(dot)org>
2+
Status: Draft
3+
Type: Standard Track
4+
Created: 21-Sep-2024
5+
Erlang-Version: 28
6+
Post-History:
7+
Replaces: 19
8+
****
9+
EEP 73: Zip generator
10+
----
11+
12+
Abstract
13+
========
14+
15+
This EEP proposes the addition of zip generators with a syntax of `&&` for
16+
comprehensions in Erlang. The idea and syntax of zip generators (comprehension
17+
multigenerators) was first brought up by EEP-19. Even if the syntax and
18+
usages of zip generators proposed by this EEP is mostly the same with EEP-19,
19+
the comprehension language of Erlang has undergone many changes since EEP-19
20+
was accepted. With an implementation that is compatible with all existing
21+
comprehensions, this EEP defines the behavior of zip generators with more
22+
clarification on the compiler's part.
23+
24+
Rationale
25+
=========
26+
27+
List comprehension is a way to create a new list from existing list(s). Lists
28+
are traversed in a dependent (nested) way. In the following example, the
29+
resulting list has length 4 when the two input lists both have length 2.
30+
31+
[{X,Y} || X <- [1,2], Y <- [3,4]] = [{1,3}, {1,4}, {2,3}, {2,4}].
32+
33+
In contrast, parallel list comprehension (also known as zip comprehension)
34+
evaluates qualifiers (a generalization of lists) in parallel. Qualifiers are
35+
first "zipped" together, and then evaluated. Many functional languages
36+
([Haskell][2], [Racket][3], etc.) and non-functional languages (Python etc.)
37+
support this variation. Suppose the two lists in the example above are
38+
evaluated as a zip comprehension, the result would be `[{1,3}, {2,4}]`.
39+
40+
Zip comprehensions allow the user to conveniently iterate over several lists
41+
at once. Without it, the standard way to accomplish the same task in Erlang
42+
is to use `lists:zip` to zip two lists into two-tuples, or to use `lists:zip3`
43+
to zip three lists into three-tuples. The list module does not provide a
44+
function to zip more than three lists. Functions like `lists:zip` always
45+
create intermediate data structures when compiled. The compiler does not
46+
perform deforestation to eliminate the unwanted tuples.
47+
48+
Zip generators is a generalization of zip comprehensions. Every set of zipped
49+
generators is treated as one generator. Instead of constraining a comprehension
50+
to be either zipped or non-zipped, any generator can be either a zip generator
51+
(containing at least two generators zipped together), or a non-zip generator
52+
(all existing generators are non-zip generator). Therefore, zip generators
53+
can be mixed freely with all existing generators and filters. Zip comprehension
54+
then becomes a special case of comprehension where only zip generators are
55+
used.
56+
57+
Within the OTP codebase, there are many uses of `lists:zip` within comprehensions.
58+
All of them can be simplified by zip generators using `&&` syntax. For example,
59+
The `yecc.erl` in parsetools contains the following comprehension (external
60+
function calls and irrelevant fields redacted for readability):
61+
62+
PartDataL = [#part_data{name = Nm, eq_state = Eqs, actions = P, states = S}
63+
|| {{Nm,P}, {Nm,S}, {Nm,EqS}} <-
64+
lists:zip3(PartNameL, PartInStates, PartStates)].
65+
66+
When using zip generators, the comprehension is rewritten to:
67+
68+
PartDataL = [#part_data{name = Nm, eq_state = Eqs, actions = P, states = S}
69+
|| {Nm,P} <- PartNameL && {Nm,S} <- PartInStates && {Nm,EqS} <- PartStates].
70+
71+
By using zip generators, the compiler avoids the need to build the intermediate
72+
list of tuples. Variable bindings and pattern matching within a zip generator
73+
works as expected, as `Nm` is supposed to bind to the same value in `{Nm,P}`
74+
and `{Nm,S}`. If the binding fails, then one element from each of the 3
75+
generators is skipped. (If a strict generator is used, then the comprehension
76+
fails with exception `badmatch`, as specified in EEP-70.)
77+
78+
In summary, zip generators remove the user's need to call the zip function
79+
within comprehensions and allows for any number of lists to be zipped at once.
80+
It can be used in list, binary, and map comprehensions, and mixed freely with
81+
all existing generators and filters. Internally, the compiler does not create
82+
any intermediate data structure, therefore also removing the need of
83+
deforestation.
84+
85+
Specification
86+
========================
87+
88+
Currently, Erlang supports three kinds of comprehensions, list comprehension,
89+
binary comprehension, and map comprehension. Their names refer to the result
90+
of the comprehension. List comprehension produces a list; binary comprehension
91+
produces a binary, etc.
92+
93+
[Expression || Qualifier(s)] %% List Comprehension
94+
<<Expression || Qualifier(s)>> %% Binary Comprehension
95+
#{Expression || Qualifier(s)} %% Map Comprehension
96+
97+
Qualifiers can have the following kind: filter, list generator, bitstring
98+
generator, and map generator. Except for filters, the other three kinds of
99+
qualifiers are generators. Their names refer to the type on the right hand
100+
side of `<-` or `<=`. Generators have the following form:
101+
102+
Pattern <- List %% List Generator
103+
Pattern <= Bitstring %% Bitstring Generator
104+
Pattern_1 := Pattern_2 <- Map %% Map Generator
105+
106+
All qualifiers and filters can be freely used and mixed in all 3 kinds of
107+
comprehensions. The following example shows a list comprehension with a
108+
list generator and a bitstring generator.
109+
110+
[{X,Y} || X <- [1,2,3], <<Y>> <= <<4,5,6>>].
111+
112+
This EEP proposes the addition of zip generators. A zip generator is two or
113+
more generators connected by `&&`. Zip generators is constructed to connect
114+
any number of the 3 kinds of generators above. Zip generators can be used
115+
in list, binary, or map comprehensions in the same way.
116+
117+
For example, if the two generators in the above example is combined together
118+
as a zip generator, the comprehension would look like:
119+
120+
[{X,Y} || X <- [1,2,3] && <<Y>> <= <<4,5,6>>].
121+
122+
For every zip generator of the form
123+
`G1 && ... && Gn`, it is evaluated to have the same result as `zip/n` where
124+
125+
zip([H1|T1], ..., [Hn|Tn]) ->
126+
[{H1,...,Hn} | zip(T1, ..., Tn)];
127+
zip([], ..., []) ->
128+
[].
129+
130+
Therefore, the above comprehension evaluates to `[{1,4}, {2,5}, {3,6}]`, which
131+
is the same as if using `lists:zip/2`.
132+
133+
Zip generator can also be used when a comprehension contains other non-zip
134+
generators and/or filters. The `&&` symbol has a higher precedence than `,`.
135+
136+
The following example evaluates to `[{b,4}, {c,6}]`. The element `{a,2}` is
137+
filtered out from the resulting list.
138+
139+
[{X, Y} || X <- [a, b, c] && <<Y>> <= <<2, 4, 6>>, Y =/= 2].
140+
141+
Comparing to using helper functions, there is one advantage of using a zip
142+
generator: The Erlang compiler does not generate any tuple when a zip
143+
generator is translated into core Erlang. The generated code reflects the
144+
programmer's intent, which is to collect one element from every list at a
145+
time without creating a list of tuples.
146+
147+
Error Behaviors
148+
================
149+
150+
One would expect that when errors happen, a zip generator behaves the same
151+
as `lists:zip/2`, `lists:zip3/3`, and also the `zip/n` function above when
152+
more than 3 lists are zipped together. The design and implementation of
153+
zip generators aim to achieve that both for compiled code and for comprehensions
154+
evaluated in Erlang shell.
155+
156+
Generators of Different Lengths
157+
--------------
158+
159+
`lists:zip/2` and `lists:zip3/3` will fail if the given lists are not of the
160+
same length, where `zip/n` will also crash. Therefore, a zip generator raises a
161+
`bad generators` error when it discovers that the given generators are of
162+
different lengths.
163+
164+
When a zip generator crashes because the containing generators are of
165+
different lengths, the internal error message is a tuple, where the first
166+
element is the atom `bad_generators`, and the second element is a tuple that
167+
contains the remaining data from all generators. The user-facing error message
168+
is `bad generators:`, followed by the tuple containing remaining data from
169+
all generators.
170+
171+
For example, this comprehension will crash at runtime.
172+
173+
[{X,Y} || X <- [1,2,3] && Y <- [1,2,3,4]].
174+
175+
The resulting error tuple is `{bad_generators,{[],[4]}}`. This is because
176+
when the comprehension crashes, the first list in the zip generator has
177+
only the empty list `[]` left, while the second list in the zip generator
178+
has `[4]` left.
179+
180+
On the compiler's side, it is rather difficult to return the original zip
181+
generator in the error message, or to point out which generator is of
182+
different length comparing to others. The proposed error message aims to
183+
gives the most helpful information without imposing extra burden on the
184+
compiler or runtime.
185+
186+
Non-generator in a Zip Generator
187+
-----------------
188+
189+
As the idea of zipping only makes sense for generators, a zip generator cannot
190+
contain filters or any expression that is not a generator. Whenever it is
191+
possible to catch such an error at compile-time, this error is caught by
192+
the Erlang linter.
193+
194+
For example, the zip generator in the following comprehension contains a
195+
filter.
196+
197+
zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0].
198+
199+
When the function is compiled, the linter points out that only generators are
200+
allowed in a zip generator, together with the position of the non-generator.
201+
202+
t.erl:6:55: only generators are allowed in a zip generator.
203+
% 6| zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0].
204+
% | ^
205+
206+
Backwards Compatibility
207+
========================
208+
209+
The operator `&&` is not used in Erlang. No existing code is affected by
210+
this addition.
211+
212+
Reference Implementation
213+
========================
214+
215+
[compiler: Add zip generators for comprehensions][1] contains the implementation
216+
for zip generators.
217+
218+
[1]: https://github.com/erlang/otp/pull/8926
219+
[2]: https://downloads.haskell.org/~ghc/5.00/docs/set/parallel-list-comprehensions.html
220+
[3]: https://docs.racket-lang.org/reference/for.html
221+
222+
Copyright
223+
=========
224+
225+
This document is placed in the public domain or under the CC0-1.0-Universal
226+
license, whichever is more permissive.
227+
228+
[EmacsVar]: <> "Local Variables:"
229+
[EmacsVar]: <> "mode: indented-text"
230+
[EmacsVar]: <> "indent-tabs-mode: nil"
231+
[EmacsVar]: <> "sentence-end-double-space: t"
232+
[EmacsVar]: <> "fill-column: 70"
233+
[EmacsVar]: <> "coding: utf-8"
234+
[EmacsVar]: <> "End:"
235+
[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: "

0 commit comments

Comments
 (0)