Skip to content

Document required relationship arg on dplyr joins#248

Merged
d-morrison merged 7 commits into
mainfrom
claude/happy-franklin-6itw7g
Jun 15, 2026
Merged

Document required relationship arg on dplyr joins#248
d-morrison merged 7 commits into
mainfrom
claude/happy-franklin-6itw7g

Conversation

@d-morrison

@d-morrison d-morrison commented Jun 14, 2026

Copy link
Copy Markdown
Member

Summary

Adds contributor guidance that dplyr *_join() calls must specify the
relationship argument (e.g. relationship = "many-to-one"). Declaring
the expected cardinality documents the join's intent and turns an
unexpected many-to-many match into an error instead of a silent row
explosion. Filtering joins (semi_join() / anti_join()) are out of
scope — they cannot duplicate rows.

Changes

  • CLAUDE.md — new bullet under Idiomatic-code review focus so the
    @claude review agent flags joins missing relationship.
  • .github/copilot-instructions.md — new code-style bullet alongside
    the existing "prefer dplyr joins over merge()" guidance.
  • .lintr.R — a documenting comment noting the requirement. lintr has
    no built-in "missing argument" linter, so this rule is enforced via
    review rather than automatically; an enforcing custom linter would
    immediately flag existing joins (e.g. in R/Run_Mod.R,
    R/reconcile_subject_ids.R, R/sim_case_data.R) and break CI.
  • NEWS.md / DESCRIPTION — dev changelog entry and version bump
    (0.0.0.90580.0.0.9059, after merging the latest main).

Reference: PR #240.

https://claude.ai/code/session_01GKHL3aov2221tUyUDSW5Jo

Add contributor guidance that dplyr *_join() calls must specify the
relationship argument so unexpected many-to-many matches error out
instead of silently duplicating rows. Covers CLAUDE.md, the copilot
instructions, and a documenting note in .lintr.R.

https://claude.ai/code/session_01GKHL3aov2221tUyUDSW5Jo
@claude

This comment has been minimized.

@claude

This comment has been minimized.

@claude

This comment has been minimized.

@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

@claude

This comment has been minimized.

@claude

This comment has been minimized.

@claude

This comment has been minimized.

@d-morrison d-morrison marked this pull request as ready for review June 15, 2026 15:09
@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Claude finished @d-morrison's task in 2m 7s —— View job


Code Review: PR #248 — Document required relationship arg on dplyr joins

  • Gather context
  • Read changed files
  • Review CLAUDE.md changes
  • Review .github/copilot-instructions.md changes
  • Review .lintr.R changes
  • Review DESCRIPTION / NEWS.md changes
  • Post final review

Documentation-only PR. All five files are correct. Notes below.

Per-file assessment

CLAUDE.md (lines 45–56 in HEAD) — New bullet placed logically between the merge() rule and the Near-duplicate paths rule. The rule, rationale ("turns an unexpected many-to-many match into an error instead of a silent row explosion"), and scope exclusion for filtering joins are technically accurate. The clarification that filtering joins "return at most one row per left-hand row and so cannot duplicate rows" is precisely correct. ✓

.github/copilot-instructions.md (line 559) — Placement after the merge() rule, consistent with CLAUDE.md's ordering. The terser phrasing ("they cannot duplicate rows") vs CLAUDE.md's fuller explanation is appropriate for this file's style. Both capture the same semantics. ✓

.lintr.R (lines 52–57) — The comment honestly acknowledges the enforcement gap: lintr has no built-in argument-presence linter, and wiring one up now would immediately flag existing joins in R/Run_Mod.R, R/reconcile_subject_ids.R, and R/sim_case_data.R, breaking CI. The cross-references to both guidance files are accurate. Placement between the snake_case block and linters <- is slightly thematically disconnected from surrounding code (which is about custom regexes and undesirable functions), but it's a clear, standalone policy note and has no functional impact. ✓

NEWS.md (lines 3–7) — Entry is clear, backtick-wraps identifiers, and follows the style of surrounding entries. ✓

DESCRIPTION — Version bump 0.0.0.9058 → 0.0.0.9059 is correct. ✓

Minor observations (non-blocking)

  1. nest_join() not listed. The four listed joins (left_join(), right_join(), inner_join(), full_join()) are the ones that appear in this codebase and cover almost all real usage. nest_join() also accepts relationship but behaves structurally differently (nests rather than flattens) and is rarely used here, so the omission is fine in practice.

  2. Only "many-to-one" shown as an example. The valid values are "one-to-one", "one-to-many", "many-to-one", and "many-to-many" (the permissive default when omitted). Briefly listing them would help reviewers know what to suggest when flagging a missing argument — though dplyr's own error message enumerates them if the value is wrong, so it's not blocking.

Verdict

Approve. The rule is technically accurate, the scope exclusion for filtering joins is well-reasoned, and the three-file strategy (CLAUDE.md for the review agent, .github/copilot-instructions.md for contributors, .lintr.R for linter-config readers) is a sensible belt-and-suspenders approach for a rule that can't be automatically enforced yet.

@d-morrison d-morrison merged commit f6901a9 into main Jun 15, 2026
16 checks passed
@d-morrison d-morrison deleted the claude/happy-franklin-6itw7g branch June 15, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants