|
| 1 | + Author: Isabell Huang <isabell(at)erlang(dot)org> |
| 2 | + Status: Draft |
| 3 | + Type: Standard Track |
| 4 | + Created: 21-Sep-2024 |
| 5 | + Erlang-Version: 28 |
| 6 | + Post-History: |
| 7 | + Replaces: 19 |
| 8 | +**** |
| 9 | +EEP 73: Zip generator |
| 10 | +---- |
| 11 | + |
| 12 | +Abstract |
| 13 | +======== |
| 14 | + |
| 15 | +This EEP proposes the addition of zip generators with a syntax of `&&` for |
| 16 | +comprehensions in Erlang. The idea and syntax of zip generators (comprehension |
| 17 | +multigenerators) was first brought up by EEP-19. Even if the syntax and |
| 18 | +usages of zip generators proposed by this EEP is mostly the same with EEP-19, |
| 19 | +the comprehension language of Erlang has undergone many changes since EEP-19 |
| 20 | +was accepted. With an implementation that is compatible with all existing |
| 21 | +comprehensions, this EEP defines the behavior of zip generators with more |
| 22 | +clarification on the compiler's part. |
| 23 | + |
| 24 | +Rationale |
| 25 | +========= |
| 26 | + |
| 27 | +List comprehension is a way to create a new list from existing list(s). Lists |
| 28 | +are traversed in a dependent (nested) way. In the following example, the |
| 29 | +resulting list has length 4 when the two input lists both have length 2. |
| 30 | + |
| 31 | + [{X,Y} || X <- [1,2], Y <- [3,4]] = [{1,3}, {1,4}, {2,3}, {2,4}]. |
| 32 | + |
| 33 | +In contrast, parallel list comprehension (also known as zip comprehension) |
| 34 | +evaluates qualifiers (a generalization of lists) in parallel. Qualifiers are |
| 35 | +first "zipped" together, and then evaluated. Many functional languages |
| 36 | +([Haskell][2], [Racket][3], etc.) and non-functional languages (Python etc.) |
| 37 | +support this variation. Suppose the two lists in the example above are |
| 38 | +evaluated as a zip comprehension, the result would be `[{1,3}, {2,4}]`. |
| 39 | + |
| 40 | +Zip comprehensions allow the user to conveniently iterate over several lists |
| 41 | +at once. Without it, the standard way to accomplish the same task in Erlang |
| 42 | +is to use `lists:zip` to zip two lists into two-tuples, or to use `lists:zip3` |
| 43 | +to zip three lists into three-tuples. The list module does not provide a |
| 44 | +function to zip more than three lists. Functions like `lists:zip` always |
| 45 | +create intermediate data structures when compiled. The compiler does not |
| 46 | +perform deforestation to eliminate the unwanted tuples. |
| 47 | + |
| 48 | +Zip generators is a generalization of zip comprehensions. Every set of zipped |
| 49 | +generators is treated as one generator. Instead of constraining a comprehension |
| 50 | +to be either zipped or non-zipped, any generator can be either a zip generator |
| 51 | +(containing at least two generators zipped together), or a non-zip generator |
| 52 | +(all existing generators are non-zip generator). Therefore, zip generators |
| 53 | +can be mixed freely with all existing generators and filters. Zip comprehension |
| 54 | +then becomes a special case of comprehension where only zip generators are |
| 55 | +used. |
| 56 | + |
| 57 | +Within the OTP codebase, there are many uses of `lists:zip` within comprehensions. |
| 58 | +All of them can be simplified by zip generators using `&&` syntax. For example, |
| 59 | +The `yecc.erl` in parsetools contains the following comprehension (external |
| 60 | +function calls and irrelevant fields redacted for readability): |
| 61 | + |
| 62 | + PartDataL = [#part_data{name = Nm, eq_state = Eqs, actions = P, states = S} |
| 63 | + || {{Nm,P}, {Nm,S}, {Nm,EqS}} <- |
| 64 | + lists:zip3(PartNameL, PartInStates, PartStates)]. |
| 65 | + |
| 66 | +When using zip generators, the comprehension is rewritten to: |
| 67 | + |
| 68 | + PartDataL = [#part_data{name = Nm, eq_state = Eqs, actions = P, states = S} |
| 69 | + || {Nm,P} <- PartNameL && {Nm,S} <- PartInStates && {Nm,EqS} <- PartStates]. |
| 70 | + |
| 71 | +By using zip generators, the compiler avoids the need to build the intermediate |
| 72 | +list of tuples. Variable bindings and pattern matching within a zip generator |
| 73 | +works as expected, as `Nm` is supposed to bind to the same value in `{Nm,P}` |
| 74 | +and `{Nm,S}`. If the binding fails, then one element from each of the 3 |
| 75 | +generators is skipped. (If a strict generator is used, then the comprehension |
| 76 | +fails with exception `badmatch`, as specified in EEP-70.) |
| 77 | + |
| 78 | +In summary, zip generators remove the user's need to call the zip function |
| 79 | +within comprehensions and allows for any number of lists to be zipped at once. |
| 80 | +It can be used in list, binary, and map comprehensions, and mixed freely with |
| 81 | +all existing generators and filters. Internally, the compiler does not create |
| 82 | +any intermediate data structure, therefore also removing the need of |
| 83 | +deforestation. |
| 84 | + |
| 85 | +Specification |
| 86 | +======================== |
| 87 | + |
| 88 | +Currently, Erlang supports three kinds of comprehensions, list comprehension, |
| 89 | +binary comprehension, and map comprehension. Their names refer to the result |
| 90 | +of the comprehension. List comprehension produces a list; binary comprehension |
| 91 | +produces a binary, etc. |
| 92 | + |
| 93 | + [Expression || Qualifier(s)] %% List Comprehension |
| 94 | + <<Expression || Qualifier(s)>> %% Binary Comprehension |
| 95 | + #{Expression || Qualifier(s)} %% Map Comprehension |
| 96 | + |
| 97 | +Qualifiers can have the following kind: filter, list generator, bitstring |
| 98 | +generator, and map generator. Except for filters, the other three kinds of |
| 99 | +qualifiers are generators. Their names refer to the type on the right hand |
| 100 | +side of `<-` or `<=`. Generators have the following form: |
| 101 | + |
| 102 | + Pattern <- List %% List Generator |
| 103 | + Pattern <= Bitstring %% Bitstring Generator |
| 104 | + Pattern_1 := Pattern_2 <- Map %% Map Generator |
| 105 | + |
| 106 | +All qualifiers and filters can be freely used and mixed in all 3 kinds of |
| 107 | +comprehensions. The following example shows a list comprehension with a |
| 108 | +list generator and a bitstring generator. |
| 109 | + |
| 110 | + [{X,Y} || X <- [1,2,3], <<Y>> <= <<4,5,6>>]. |
| 111 | + |
| 112 | +This EEP proposes the addition of zip generators. A zip generator is two or |
| 113 | +more generators connected by `&&`. Zip generators is constructed to connect |
| 114 | +any number of the 3 kinds of generators above. Zip generators can be used |
| 115 | +in list, binary, or map comprehensions in the same way. |
| 116 | + |
| 117 | +For example, if the two generators in the above example is combined together |
| 118 | +as a zip generator, the comprehension would look like: |
| 119 | + |
| 120 | + [{X,Y} || X <- [1,2,3] && <<Y>> <= <<4,5,6>>]. |
| 121 | + |
| 122 | +For every zip generator of the form |
| 123 | +`G1 && ... && Gn`, it is evaluated to have the same result as `zip/n` where |
| 124 | + |
| 125 | + zip([H1|T1], ..., [Hn|Tn]) -> |
| 126 | + [{H1,...,Hn} | zip(T1, ..., Tn)]; |
| 127 | + zip([], ..., []) -> |
| 128 | + []. |
| 129 | + |
| 130 | +Therefore, the above comprehension evaluates to `[{1,4}, {2,5}, {3,6}]`, which |
| 131 | +is the same as if using `lists:zip/2`. |
| 132 | + |
| 133 | +Zip generator can also be used when a comprehension contains other non-zip |
| 134 | +generators and/or filters. The `&&` symbol has a higher precedence than `,`. |
| 135 | + |
| 136 | +The following example evaluates to `[{b,4}, {c,6}]`. The element `{a,2}` is |
| 137 | +filtered out from the resulting list. |
| 138 | + |
| 139 | + [{X, Y} || X <- [a, b, c] && <<Y>> <= <<2, 4, 6>>, Y =/= 2]. |
| 140 | + |
| 141 | +Comparing to using helper functions, there is one advantage of using a zip |
| 142 | +generator: The Erlang compiler does not generate any tuple when a zip |
| 143 | +generator is translated into core Erlang. The generated code reflects the |
| 144 | +programmer's intent, which is to collect one element from every list at a |
| 145 | +time without creating a list of tuples. |
| 146 | + |
| 147 | +Error Behaviors |
| 148 | +================ |
| 149 | + |
| 150 | +One would expect that when errors happen, a zip generator behaves the same |
| 151 | +as `lists:zip/2`, `lists:zip3/3`, and also the `zip/n` function above when |
| 152 | +more than 3 lists are zipped together. The design and implementation of |
| 153 | +zip generators aim to achieve that both for compiled code and for comprehensions |
| 154 | +evaluated in Erlang shell. |
| 155 | + |
| 156 | +Generators of Different Lengths |
| 157 | +-------------- |
| 158 | + |
| 159 | +`lists:zip/2` and `lists:zip3/3` will fail if the given lists are not of the |
| 160 | +same length, where `zip/n` will also crash. Therefore, a zip generator raises a |
| 161 | +`bad generators` error when it discovers that the given generators are of |
| 162 | +different lengths. |
| 163 | + |
| 164 | +When a zip generator crashes because the containing generators are of |
| 165 | +different lengths, the internal error message is a tuple, where the first |
| 166 | +element is the atom `bad_generators`, and the second element is a tuple that |
| 167 | +contains the remaining data from all generators. The user-facing error message |
| 168 | +is `bad generators:`, followed by the tuple containing remaining data from |
| 169 | +all generators. |
| 170 | + |
| 171 | +For example, this comprehension will crash at runtime. |
| 172 | + |
| 173 | + [{X,Y} || X <- [1,2,3] && Y <- [1,2,3,4]]. |
| 174 | + |
| 175 | +The resulting error tuple is `{bad_generators,{[],[4]}}`. This is because |
| 176 | +when the comprehension crashes, the first list in the zip generator has |
| 177 | +only the empty list `[]` left, while the second list in the zip generator |
| 178 | +has `[4]` left. |
| 179 | + |
| 180 | +On the compiler's side, it is rather difficult to return the original zip |
| 181 | +generator in the error message, or to point out which generator is of |
| 182 | +different length comparing to others. The proposed error message aims to |
| 183 | +gives the most helpful information without imposing extra burden on the |
| 184 | +compiler or runtime. |
| 185 | + |
| 186 | +Non-generator in a Zip Generator |
| 187 | +----------------- |
| 188 | + |
| 189 | +As the idea of zipping only makes sense for generators, a zip generator cannot |
| 190 | +contain filters or any expression that is not a generator. Whenever it is |
| 191 | +possible to catch such an error at compile-time, this error is caught by |
| 192 | +the Erlang linter. |
| 193 | + |
| 194 | +For example, the zip generator in the following comprehension contains a |
| 195 | +filter. |
| 196 | + |
| 197 | + zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0]. |
| 198 | + |
| 199 | +When the function is compiled, the linter points out that only generators are |
| 200 | +allowed in a zip generator, together with the position of the non-generator. |
| 201 | + |
| 202 | + t.erl:6:55: only generators are allowed in a zip generator. |
| 203 | + % 6| zip() -> [{X,Y} || X <- [1,2,3] && Y <- [1,2,3] && X > 0]. |
| 204 | + % | ^ |
| 205 | + |
| 206 | +Backwards Compatibility |
| 207 | +======================== |
| 208 | + |
| 209 | +The operator `&&` is not used in Erlang. No existing code is affected by |
| 210 | +this addition. |
| 211 | + |
| 212 | +Reference Implementation |
| 213 | +======================== |
| 214 | + |
| 215 | +[compiler: Add zip generators for comprehensions][1] contains the implementation |
| 216 | +for zip generators. |
| 217 | + |
| 218 | +[1]: https://github.com/erlang/otp/pull/8926 |
| 219 | +[2]: https://downloads.haskell.org/~ghc/5.00/docs/set/parallel-list-comprehensions.html |
| 220 | +[3]: https://docs.racket-lang.org/reference/for.html |
| 221 | + |
| 222 | +Copyright |
| 223 | +========= |
| 224 | + |
| 225 | +This document is placed in the public domain or under the CC0-1.0-Universal |
| 226 | +license, whichever is more permissive. |
| 227 | + |
| 228 | +[EmacsVar]: <> "Local Variables:" |
| 229 | +[EmacsVar]: <> "mode: indented-text" |
| 230 | +[EmacsVar]: <> "indent-tabs-mode: nil" |
| 231 | +[EmacsVar]: <> "sentence-end-double-space: t" |
| 232 | +[EmacsVar]: <> "fill-column: 70" |
| 233 | +[EmacsVar]: <> "coding: utf-8" |
| 234 | +[EmacsVar]: <> "End:" |
| 235 | +[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: " |
0 commit comments