Skip to content

dplyr now has optional immediate error on multiple-match and unmatched-key joins #1246

Open
@bpbond

Description

@bpbond

FYI dplyr 1.1.0 provides a way to immediately error if a join returns more than one row from y, or if there's no match:

Multiple matches in equality joins like this one are typically unexpected (even though they are baked in to SQL) so we’ve also added a new warning to alert you when this happens. If multiple matches are expected, you can explicitly set multiple = "all" to silence this warning. This also serves as a code “sign post” for future readers of your code to let them know that this is a join that is expected to increase the number of rows in the data. If multiple matches aren’t expected, you can also set multiple = "error" to immediately halt the analysis.

https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/#inequality-joins
Update: https://www.tidyverse.org/blog/2023/03/dplyr-1-1-1/

multiple: Handling of rows in x with multiple matches in y. For each row of x:
"all", the default, returns every match detected in y. This is the same behavior as SQL.
"any" returns one match detected in y, with no guarantees on which match will be returned. It is often faster than "first" and "last" if you just need to detect if there is at least one match.
"first" returns the first match detected in y.
"last" returns the last match detected in y.

unmatched: How should unmatched keys that would result in dropped rows be handled?
"drop" drops unmatched keys from the result.
"error" throws an error if unmatched keys are detected.

When gcamdata is ready to move to dplyr 1.1, this should allow for the removal of both left_join_keep_first_only and left_join_error_no_match I think?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions