Skip to content

Add .by to fill() #1572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# tidyr (development version)

* `fill()` gains a `.by` argument as an alternative to `dplyr::group_by()` for
applying the fill per group, similar to `nest(.by =)` and
`dplyr::mutate(.by =)` (@olivroy, #1439).

* `unchop()` produces a more helpful error message when columns cannot be cast
to `ptype` (@mgirlich, #1477).

Expand Down
40 changes: 31 additions & 9 deletions R/fill.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,17 @@
#' @section Grouped data frames:
#' With grouped data frames created by [dplyr::group_by()], `fill()` will be
#' applied _within_ each group, meaning that it won't fill across group
#' boundaries.
#' boundaries. This can also be accomplished using the `.by` argument to
#' `fill()`, which creates a temporary grouping for just this operation.
#'
#' @inheritParams dplyr::mutate
#'
#' @param data A data frame.
#' @param ... <[`tidy-select`][tidyr_tidy_select]> Columns to fill.
#' @param .direction Direction in which to fill missing values. Currently
#' either "down" (the default), "up", "downup" (i.e. first down and then up)
#' or "updown" (first up and then down).
#'
#' @export
#' @examples
#' # direction = "down" --------------------------------------------------------
Expand Down Expand Up @@ -82,22 +86,36 @@
#' 3, "Danielle", "Observer", NA
#' )
#'
#' # The values are inconsistently missing by position within the group
#' # Use .direction = "downup" to fill missing values in both directions
#' # The values are inconsistently missing by position within the `group`.
#' # Use `.direction = "downup"` to fill missing values in both directions
#' # and `.by = group` to apply the fill per group.
#' squirrels %>%
#' fill(n_squirrels, .direction = "downup", .by = group)
#'
#' # If you want, you can also supply a data frame grouped with `group_by()`,
#' # but don't forget to `ungroup()`!
#' squirrels %>%
#' dplyr::group_by(group) %>%
#' fill(n_squirrels, .direction = "downup") %>%
#' dplyr::ungroup()
#'
#' # Using `.direction = "updown"` accomplishes the same goal in this example
fill <- function(data, ..., .direction = c("down", "up", "downup", "updown")) {
fill <- function(data,
...,
.by = NULL,
.direction = c("down", "up", "downup", "updown")) {
check_dots_unnamed()
UseMethod("fill")
}

#' @export
fill.data.frame <- function(data, ..., .direction = c("down", "up", "downup", "updown")) {
vars <- tidyselect::eval_select(expr(c(...)), data, allow_rename = FALSE)
fill.data.frame <- function(data,
...,
.by = NULL,
.direction = c("down", "up", "downup", "updown")) {
vars <- names(tidyselect::eval_select(
expr = expr(c(...)),
data = data,
allow_rename = FALSE
))

.direction <- arg_match0(
arg = .direction,
Expand All @@ -108,5 +126,9 @@ fill.data.frame <- function(data, ..., .direction = c("down", "up", "downup", "u
vec_fill_missing(col, direction = .direction)
}

dplyr::mutate_at(data, .vars = dplyr::vars(any_of(vars)), .funs = fn)
dplyr::mutate(
Comment on lines -111 to +129
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also switching away from mutate_at() here

data,
dplyr::across(any_of(vars), .fns = fn),
.by = {{ .by }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing straight through to dplyr means that if .by is wrong for some reason, then the error call shows dplyr::mutate() rather than fill(). I think that is worth it to not have to fiddle with .by in any way, because dplyr has some pretty unique handling for it. (See the tests for more info)

)
}
23 changes: 17 additions & 6 deletions man/fill.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions tests/testthat/_snaps/fill.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# errors on named `...` inputs

Code
fill(df, fooy = x)
Condition
Error in `fill()`:
! Arguments in `...` must be passed by position, not name.
x Problematic argument:
* fooy = x

# validates its inputs

Code
Expand All @@ -6,3 +16,20 @@
Error in `fill()`:
! `.direction` must be one of "down", "up", "downup", or "updown", not "foo".

# `.by` can't select columns that don't exist

Code
fill(df, y, .by = z)
Condition
Error in `dplyr::mutate()`:
! Can't select columns that don't exist.
x Column `z` doesn't exist.

# `.by` can't be used on a grouped data frame

Code
fill(df, y, .by = x)
Condition
Error in `dplyr::mutate()`:
! Can't supply `.by` when `.data` is a grouped data frame.

42 changes: 39 additions & 3 deletions tests/testthat/test-fill.R
Original file line number Diff line number Diff line change
Expand Up @@ -106,23 +106,59 @@ test_that("fill preserves attributes", {
expect_equal(attributes(out_u$x), attributes(df$x))
})

test_that("fill respects grouping", {
test_that("fill respects existing grouping and `.by`", {
df <- tibble(x = c(1, 1, 2), y = c(1, NA, NA))

out <- df %>%
dplyr::group_by(x) %>%
fill(y)
expect_equal(out$y, c(1, 1, NA))
expect_identical(out$y, c(1, 1, NA))
expect_identical(dplyr::group_vars(out), "x")

out <- df %>%
fill(y, .by = x)
expect_identical(out$y, c(1, 1, NA))
expect_identical(dplyr::group_vars(out), character())
})

test_that("works when there is a column named `.direction` in the data (#1319)", {
df <- tibble(x = c(1, NA, 2), .direction = 1:3)
expect_error(out <- fill(df, x), NA)
out <- fill(df, x)
expect_identical(out$x, c(1, 1, 2))
})

test_that("errors on named `...` inputs", {
df <- tibble(x = c(1, NA, 2))

expect_snapshot(error = TRUE, {
fill(df, fooy = x)
})
})
Comment on lines +130 to +136
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted we didn't have a test for this and since we are touching check_dots_unnamed() here we may as well add one


test_that("validates its inputs", {
df <- tibble(x = c(1, NA, 2))
expect_snapshot(error = TRUE, {
df %>% fill(x, .direction = "foo")
})
})

test_that("`.by` can't select columns that don't exist", {
df <- tibble(x = 1, y = 2)

# This shows `mutate()` in the error, but we accept that to not have to handle
# `.by` in any way.
expect_snapshot(error = TRUE, {
fill(df, y, .by = z)
})
})

test_that("`.by` can't be used on a grouped data frame", {
df <- tibble(x = 1, y = 2)
df <- dplyr::group_by(df, x)

# This shows `mutate()` in the error, but we accept that to not have to handle
# `.by` in any way.
expect_snapshot(error = TRUE, {
fill(df, y, .by = x)
})
})
Loading