Skip to content

Commit f6901a9

Browse files
authored
Merge pull request #248 from UCD-SERG/claude/happy-franklin-6itw7g
Document required relationship arg on dplyr joins
2 parents 89b70c6 + 0b1a713 commit f6901a9

5 files changed

Lines changed: 25 additions & 1 deletion

File tree

.github/copilot-instructions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -556,6 +556,7 @@ expect_false(has_missing_values(complete_data))
556556
- **Use tidy-selection, not data-masking, in selection contexts**: In `select()`, `rename()`, `summarize(.by = )` / `group_by()`, `across(.cols = )`, `pivot_*(names_from = )`, and join `by =`, select columns named by a variable with `all_of()` / `any_of()` (e.g. `select(any_of(c("a", "b", "new_name" = var)))`), and use bare names or `all_of()` in `.by`. Do **not** reach for the `.data` pronoun (`.data[[var]]`, `.data$x`) there — that pronoun belongs in *data-masking* verbs (`mutate()`, `filter()`, `summarize()` expressions). Using `.data[[var]]` in a selection context is a *soft* deprecation and may not warn at runtime, so it is easy to miss — but it should be rewritten.
557557
- **Collapse branching that only varies columns**: An `if`/`else` whose only job is to change which columns are selected, renamed, or joined can almost always be replaced by a single `any_of()` / `all_of()` selection or a single tidyselect-keyed join. Prefer one parameterized pipeline over two near-identical stratified/unstratified branches.
558558
- **Prefer dplyr joins over base `merge()`**: Use `dplyr::left_join()` / `right_join()` with `by = join_by(...)` rather than `merge()` for readability and consistency.
559+
- **Always set `relationship` on dplyr joins**: Every `dplyr::*_join()` call (`left_join()`, `right_join()`, `inner_join()`, `full_join()`) must specify the `relationship` argument (e.g. `relationship = "many-to-one"`). Declaring the expected cardinality documents the join's intent and makes an unexpected many-to-many match fail loudly instead of silently duplicating rows. Filtering joins (`semi_join()` / `anti_join()`) are out of scope — they cannot duplicate rows. See [PR #240](https://github.com/UCD-SERG/serodynamics/pull/240/changes#diff-58597f8513171a9da41d8e6c89e4230df8879139a10dd2422aa659aa496dd29eR52-R58) for an example.
559560

560561
### Review focus: convoluted / non-idiomatic code
561562

.lintr.R

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,13 @@ snake_case_ACROs1 <- rex::rex(
4949
end
5050
)
5151

52+
# Note: dplyr `*_join()` calls must specify the `relationship` argument
53+
# (e.g. `relationship = "many-to-one"`) so that an unexpected
54+
# many-to-many match errors out instead of silently duplicating rows.
55+
# lintr has no built-in linter for "missing argument", so this rule is
56+
# enforced via code review rather than automatically here; see the
57+
# contributor guidance in CLAUDE.md and .github/copilot-instructions.md.
58+
5259
linters <- lintr::linters_with_defaults(
5360
return_linter = NULL,
5461
trailing_whitespace_linter = NULL,

CLAUDE.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,17 @@ form. Treat these as in-scope review findings, not optional nits:
4242
`right_join()` with `by = join_by(...)` for clarity and consistency
4343
with the rest of the codebase.
4444

45+
- **dplyr `*_join()` calls must set `relationship`.** Flag any
46+
`dplyr::left_join()` / `right_join()` / `inner_join()` / `full_join()`
47+
that omits the `relationship` argument. Stating the expected
48+
cardinality explicitly (e.g. `relationship = "many-to-one"`) makes the
49+
join's intent clear and turns an unexpected many-to-many match into an
50+
error instead of a silent row explosion. Filtering joins
51+
(`semi_join()` / `anti_join()`) are out of scope here — they return at
52+
most one row per left-hand row and so cannot duplicate rows. See
53+
[PR #240](https://github.com/UCD-SERG/serodynamics/pull/240/changes#diff-58597f8513171a9da41d8e6c89e4230df8879139a10dd2422aa659aa496dd29eR52-R58)
54+
for an example.
55+
4556
- **Near-duplicate stratified / unstratified paths.** Prefer one
4657
parameterized pipeline over two near-identical branches.
4758

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: serodynamics
22
Title: What the Package Does (One Line, Title Case)
3-
Version: 0.0.0.9058
3+
Version: 0.0.0.9059
44
Authors@R: c(
55
person("Peter", "Teunis", , "p.teunis@emory.edu", role = c("aut", "cph"),
66
comment = "Author of the method and original code."),

NEWS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# serodynamics (development version)
22

3+
* Documented in `CLAUDE.md`, `.github/copilot-instructions.md`, and a
4+
note in `.lintr.R` that `dplyr::*_join()` calls must specify the
5+
`relationship` argument (for example `relationship = "many-to-one"`),
6+
so an unexpected many-to-many match errors out instead of silently
7+
duplicating rows.
38
* The test suite now sets `options(lifecycle_verbosity = "error")` (via
49
`tests/testthat/setup.R`), so tidyverse lifecycle deprecations -
510
including soft deprecations such as using the `.data` pronoun in a

0 commit comments

Comments
 (0)