fix: --skip-local uses scan roots instead of parent directory#820
Conversation
The Rust --skip-local implementation compared immediate parent directories (Path::parent()), while the TypeScript SkipLocalValidator checks if both files are under the same scan root directory. This meant clones across different subdirectories of the same project were not being filtered. Changes: - cpd-core/detect.rs: Replace parent-dir comparison with should_skip_local() that checks if both files share a scan root, mirroring the TS validator. Thread scan_roots through detect_with_options, detect_prepared, detect_in_group, flush_clone, add_secondary_clones, flush_secondary_clone. - cpd-finder/orchestrate.rs: Canonicalize both scan roots and file IDs so prefix comparisons work across macOS symlink differences (/var vs /private/var). Pass scan_roots to detect_prepared. - cpd/cli.rs: Add --skipLocal as a visible alias for --skip-local for compatibility with the TypeScript CLI flag. - skip_local_integration.rs: Fix cross_directory test to use two separate scan roots (matching TS semantics). Add same_root_subdirectories test verifying clones within one root are skipped even across subdirs.
Walkthrough
Changesscan-root-relative skip_local overhaul
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rust/crates/cpd-core/src/detect.rs`:
- Around line 369-402: The is_relative_to helper function compares raw path
components without normalization, causing false negatives when paths use
different spellings like ./repo/src/a.js versus repo, or paths with ..
components. Normalize both the file_path and dir parameters before performing
any prefix checks and ancestor walking in the is_relative_to function. This
ensures that semantically equivalent paths with different spellings are
correctly identified as being under the same root, matching the behavior of the
TypeScript path.relative() logic referenced in the documentation comment.
In `@rust/crates/cpd-finder/src/orchestrate.rs`:
- Around line 183-188: The current code canonicalizes the file path and uses it
as the `id`, which changes the public output contract unexpectedly. Instead,
preserve the original discovered/display path from file.path for the exported
`id` to maintain the current contract, and create a separate
normalized/canonical path variable for internal comparisons like skip_local
checks. Update the logic to use the original path for the id assignment and the
canonicalized path only where root comparisons are needed.
In `@rust/crates/cpd/src/cli.rs`:
- Around line 180-182: Update the doc comment for the skip_local field to
accurately reflect the current behavior. The field's help text currently states
that it skips clones in the same directory, but the actual behavior now skips
any pair that shares a scan root (including different subdirectories under one
scanned path). Modify the comment for the skip_local boolean field to describe
this broader scope of skipping pairs that share a scan root rather than just
same-directory pairs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dc795894-dd6d-4f1d-b822-a86ab92bc7fb
📒 Files selected for processing (4)
rust/crates/cpd-core/src/detect.rsrust/crates/cpd-finder/src/orchestrate.rsrust/crates/cpd-finder/tests/skip_local_integration.rsrust/crates/cpd/src/cli.rs
| fn should_skip_local(file_a: &str, file_b: &str, scan_roots: &[PathBuf]) -> bool { | ||
| scan_roots | ||
| .iter() | ||
| .any(|root| is_relative_to(file_a, root) && is_relative_to(file_b, root)) | ||
| } | ||
|
|
||
| /// Returns true if `file_path` is contained within `dir`. | ||
| /// Mirrors the TypeScript `SkipLocalValidator.isRelative`: | ||
| /// `const rel = relative(dir, file); return rel !== '' && !rel.startsWith('..') && !isAbsolute(rel);` | ||
| fn is_relative_to(file_path: &str, dir: &PathBuf) -> bool { | ||
| let file = Path::new(file_path); | ||
| // Fast path: file path starts with the dir prefix | ||
| if let Ok(rel) = file.strip_prefix(dir) { | ||
| return !rel.as_os_str().is_empty(); | ||
| } | ||
| // Mixed absolute/relative can never match via simple prefix | ||
| if file.is_absolute() != dir.is_absolute() { | ||
| return false; | ||
| } | ||
| // Walk up from the file path checking if any ancestor starts with dir | ||
| let mut ancestor = file; | ||
| loop { | ||
| if ancestor == dir.as_path() { | ||
| return false; | ||
| } | ||
| if ancestor.starts_with(dir) { | ||
| let rel = ancestor.strip_prefix(dir).unwrap_or(ancestor); | ||
| return !rel.as_os_str().is_empty(); | ||
| } | ||
| ancestor = match ancestor.parent() { | ||
| Some(p) => p, | ||
| None => return false, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Normalize both sides before is_relative_to compares them.
This helper only compares raw path components, so spellings like ./repo/src/a.js vs repo or repo/../repo/lib/b.js return false here even though the TypeScript path.relative() check referenced in the doc comment would still treat both files as under the same root. orchestrate.rs canonicalizes the CLI path, but detect_with_options and detect_prepared are public entry points, so skip_local still depends on caller-specific path spelling unless this helper normalizes lexically or the API explicitly requires normalized IDs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rust/crates/cpd-core/src/detect.rs` around lines 369 - 402, The
is_relative_to helper function compares raw path components without
normalization, causing false negatives when paths use different spellings like
./repo/src/a.js versus repo, or paths with .. components. Normalize both the
file_path and dir parameters before performing any prefix checks and ancestor
walking in the is_relative_to function. This ensures that semantically
equivalent paths with different spellings are correctly identified as being
under the same root, matching the behavior of the TypeScript path.relative()
logic referenced in the documentation comment.
| let id = file | ||
| .path | ||
| .canonicalize() | ||
| .unwrap_or_else(|_| file.path.clone()) | ||
| .to_string_lossy() | ||
| .into_owned(); |
There was a problem hiding this comment.
Keep normalization out of the exported id.
This id now feeds both RunResult.sources[*].id and the clone fragment source_ids produced later in cpd-core, and RunConfig has no flag here to opt callers into absolute/canonical paths. Canonicalizing the emitted identifier changes the public output contract just to make skip_local comparisons work. Preserve the discovered/display path for output and carry a separate normalized path for root comparisons.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rust/crates/cpd-finder/src/orchestrate.rs` around lines 183 - 188, The
current code canonicalizes the file path and uses it as the `id`, which changes
the public output contract unexpectedly. Instead, preserve the original
discovered/display path from file.path for the exported `id` to maintain the
current contract, and create a separate normalized/canonical path variable for
internal comparisons like skip_local checks. Update the logic to use the
original path for the id assignment and the canonicalized path only where root
comparisons are needed.
| /// Skip clones where both fragments are in the same directory | ||
| #[arg(long)] | ||
| #[arg(long, visible_alias = "skipLocal")] | ||
| pub skip_local: bool, |
There was a problem hiding this comment.
Update the help text to match the new skip_local semantics.
The detector no longer skips only same-directory pairs; it now drops any pair that shares a scan root, including different subdirectories under one scanned path. Leaving the old wording here makes --help describe the pre-change behavior.
Suggested text
- /// Skip clones where both fragments are in the same directory
+ /// Skip clones where both fragments are under the same scan root📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Skip clones where both fragments are in the same directory | |
| #[arg(long)] | |
| #[arg(long, visible_alias = "skipLocal")] | |
| pub skip_local: bool, | |
| /// Skip clones where both fragments are under the same scan root | |
| #[arg(long, visible_alias = "skipLocal")] | |
| pub skip_local: bool, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rust/crates/cpd/src/cli.rs` around lines 180 - 182, Update the doc comment
for the skip_local field to accurately reflect the current behavior. The field's
help text currently states that it skips clones in the same directory, but the
actual behavior now skips any pair that shares a scan root (including different
subdirectories under one scanned path). Modify the comment for the skip_local
boolean field to describe this broader scope of skipping pairs that share a scan
root rather than just same-directory pairs.
Bug
--skip-localwas not filtering clones correctly. The Rust implementation compared immediate parent directories (Path::parent()), while the TypeScriptSkipLocalValidatorchecks if both files are under the same scan root directory.This meant:
jscpd ./project --skip-localwould NOT filter clones across./project/src/and./project/lib/(different parent dirs)Fix
should_skip_local()that checks if both files share a scan root, mirroring the TS validator. Threadscan_roots: &[PathBuf]through the detection pipeline./varvs/private/var). Pass scan roots todetect_prepared.--skipLocalas a visible alias for--skip-localfor compatibility with the TypeScript CLI flag.cross_directory_clones_survive_skip_localtest to use two separate scan roots (matching TS semantics). Addsame_root_subdirectories_skipped_by_skip_localtest verifying clones within one root are skipped even across subdirs.Verification
All 167+ Rust tests pass. Manual testing confirms:
jscpd ./dir1 ./dir2 --skip-local— filters same-root clones, keeps cross-root clones--skip-localand--skipLocalCLI flags workThis change is
Summary by CodeRabbit
New Features
skipLocalas an alias for the--skip-localCLI flagImprovements
skip_localfunctionality to correctly skip clones within the same directory structure