Description
Description
Edit: I just noticed there is more to the documentation in 1.0 which isn't on the current 0.20 documentation, which clarifies that non-list dtypes are cast to lists prior to concatenation, but my proposal still stands.
Edit 2: found #8510 which seems to be the same issue/request. I'll wait for @stinodego's feedback before closing.
The description of pl.concat_list
is:
Horizontally concatenate columns into a single list column.
This is confusing, as discussed in #17294, since the name might imply concatenating existing lists into a single list. This is the current behavior on lists:
import polars as pl
df = pl.DataFrame({
"a": [[1]], <-- pl.List(pl.Int64)
"b": [[2]],
})
df.select(pl.concat_list("a", "b"))
shape: (1, 1)
# ┌───────────┐
# │ a │
# │ --- │
# │ list[i64] │
# ╞═══════════╡
# │ [1, 2] │ <--lists concatenated together
# └───────────┘
However, concat_list
also concatenates the values in columns into lists:
import polars as pl
df = pl.DataFrame({
"a": [1], <-- pl.Int64
"b": [2],
})
df.select(pl.concat_list("a", "b"))
shape: (1, 1)
# ┌───────────┐
# │ a │
# │ --- │
# │ list[i64] │
# ╞═══════════╡
# │ [1, 2] │ <--columns concatenated together
# └───────────┘
Note that the result of the operation in both cases is identical. Instead, I propose that we have:
pl.list(a, b, ...)
which creates a newpl.List
column out of the expressionsa, b, ...
. The dtypes must have a common supertype.pl.concat_list(a, b, ...)
wherea, b, ...
must all bepl.List
columns, and they are concatenated into a single list. The inner dtypes must have a common supertype.