Skip to content

unnest_longer/unnest inconsistency with list columns of more lists? #1584

Open
@oliverbeagley-pgg

Description

@oliverbeagley-pgg

I have some code that depending on previous processes can have a zero row dataframe with correct columns and types or rows that for some columns will be a list of more lists e.g. it could be something like

foo <- function(x) {
  list(
    list(a = x * 10),
    list(a = x * 100)
  )
}

df_empty <- tibble::tibble(x = integer()) |>
  dplyr::mutate(y = lapply(x, foo))

df_empty
# # A tibble: 0 × 2
# # ℹ 2 variables: x <int>, y <list>

df_valued <- tibble::tibble(x = 1:3) |>
  dplyr::mutate(y = lapply(x, foo))

df_valued
# # A tibble: 3 × 2
#       x y         
#   <int> <list>    
# 1     1 <list [2]>
# 2     2 <list [2]>
# 3     3 <list [2]>

I'm trying to write code that will happily work with either and still result in proper columns being produced, though I'm issues getting unnest_longer to play well with the empty dataframe compared to unnest (I'd prefer to use unnest_longer as I believe it is more clear as to what it is doing).

With unnest it is:

df_empty |> tidyr::unnest("y", ptype = list())
# # A tibble: 0 × 2
# # ℹ 2 variables: x <int>, y <list>

df_valued |> tidyr::unnest("y", ptype = list())
# # A tibble: 6 × 2
#       x y               
#   <int> <list>          
# 1     1 <named list [1]>
# 2     1 <named list [1]>
# 3     2 <named list [1]>
# 4     2 <named list [1]>
# 5     3 <named list [1]>
# 6     3 <named list [1]>

Though with trying something similar with unnest_longer:

df_empty |> tidyr::unnest_longer("y", ptype = list())
# Error in `tidyr::unnest_longer()`:
# ! Can't convert `x` <logical> to <list>.
# Run `rlang::last_trace()` to see where the error occurred.

df_valued |> tidyr::unnest_longer("y", ptype = list())
# # A tibble: 6 × 2
#       x            y
#   <int> <list<list>>
# 1     1          [1]
# 2     1          [1]
# 3     2          [1]
# 4     2          [1]
# 5     3          [1]
# 6     3          [1]

I can see the outputs are slightly different based on the tibble info, but the downstream operations I'm using don't seem to care about this e.g. tacking on |> tidyr::hoist("y", "a", .ptype = list(a = integer())) with either works for df_valued.

I've tried a bunch of the arguments of unnest_longer without much success, is there something I'm missing or is this a limitation of unnest_longer? As mentioned I can use unnest so I have a work around, but it would be nice to use unnest_longer instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions