Skip to content

df_unchop() and NULL recycling #1457

Open
@DavisVaughan

Description

@DavisVaughan

df_unchop() has fairly complicated NULL recycling handling that is different from the rest of tidyr.

tidyr/R/chop.R

Lines 159 to 164 in 831bf74

# - If `keep_empty = TRUE`, empty elements (`NULL` and empty typed elements)
# are retained as their size 1 missing equivalents.
# - If `keep_empty = FALSE`, rows of entirely empty elements are dropped.
# - In the `keep_empty = FALSE` case, when determining the common size of the
# row, `NULL`s are not included in the computation, but empty typed elements
# are (i.e. you can't recycle integer() and 1:2).

This makes the implementation rather complicated, as we can't use the typical recycling rules of assuming that NULL elements are size 0 and don't recycle against anything other than other size 0 and size 1 elements.

This in turn affects unchop() and unnest() directly (at the very least).

It is also inconsistent with other tidyr functions like unnest_longer(), which do treat NULL this way.

Basically, most of tidyr treats NULL elements as if they were unspecified(0), in terms of both common size and common type determination, except df_unchop().

library(tidyr)

# WITH NULL:

# Size 0 in `x`, size 2 in `y`
df <- tibble(x = list(NULL), y = list(tibble(a = 1:2)))
df
#> # A tibble: 1 × 2
#>   x      y               
#>   <list> <list>          
#> 1 <NULL> <tibble [2 × 1]>

# Since it was `NULL` we force it to work (THIS SEEMS WEIRD)
unnest(df, c(x, y))
#> # A tibble: 2 × 2
#>   x         a
#>   <lgl> <int>
#> 1 NA        1
#> 2 NA        2

# But it doesn't work in `unnest_longer()` because `NULL` became a 0 row df
unnest_longer(df, c(x, y))
#> Error in `unchop()` at tidyr/R/unnest-longer.R:130:2:
#> ! In row 1, can't recycle input of size 0 to size 2.


# WITH EMPTY TYPED ELEMENT

# Size 0 in `x`, size 2 in `y`
df <- tibble(x = list(integer()), y = list(tibble(a = 1:2)))
df
#> # A tibble: 1 × 2
#>   x         y               
#>   <list>    <list>          
#> 1 <int [0]> <tibble [2 × 1]>

# It isn't `NULL`, so tidyverse recycling kicks in
unnest(df, c(x, y))
#> Error in `unchop()` at tidyr/R/unnest.R:180:2:
#> ! In row 1, can't recycle input of size 0 to size 2.

unnest_longer(df, c(x, y))
#> Error in `unchop()` at tidyr/R/unnest-longer.R:130:2:
#> ! In row 1, can't recycle input of size 0 to size 2.

It is probably worth considering treating NULL like size 0 elements, as this would greatly simplify the implementation and make it more consistent with other tidyr verbs. This really only affects unnest() when you unnest multiple columns simultaneously, so hopefully it doesn't affect much code in the wild.

Metadata

Metadata

Assignees

No one assigned

    Labels

    breaking change ☠️API change likely to affect existing codefeaturea feature request or enhancementnesting 🐦nesting, chopping, and packing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions