Skip to content

fix: --skip-local uses scan roots instead of parent directory#820

Merged
kucherenko merged 1 commit into
masterfrom
fix/skip-local-scan-roots
Jun 14, 2026
Merged

fix: --skip-local uses scan roots instead of parent directory#820
kucherenko merged 1 commit into
masterfrom
fix/skip-local-scan-roots

Conversation

@kucherenko

@kucherenko kucherenko commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Bug

--skip-local was not filtering clones correctly. The Rust implementation compared immediate parent directories (Path::parent()), while the TypeScript SkipLocalValidator checks if both files are under the same scan root directory.

This meant:

  • jscpd ./project --skip-local would NOT filter clones across ./project/src/ and ./project/lib/ (different parent dirs)
  • Only clones within the exact same leaf directory were filtered

Fix

  • cpd-core/detect.rs: Replace parent-dir comparison with should_skip_local() that checks if both files share a scan root, mirroring the TS validator. Thread scan_roots: &[PathBuf] through the detection pipeline.
  • cpd-finder/orchestrate.rs: Canonicalize both scan roots and file IDs so prefix comparisons work across macOS symlink differences (/var vs /private/var). Pass scan roots to detect_prepared.
  • cpd/cli.rs: Add --skipLocal as a visible alias for --skip-local for compatibility with the TypeScript CLI flag.
  • skip_local_integration.rs: Fix cross_directory_clones_survive_skip_local test to use two separate scan roots (matching TS semantics). Add same_root_subdirectories_skipped_by_skip_local test verifying clones within one root are skipped even across subdirs.

Verification

All 167+ Rust tests pass. Manual testing confirms:

  • jscpd ./dir1 ./dir2 --skip-local — filters same-root clones, keeps cross-root clones
  • Both --skip-local and --skipLocal CLI flags work

This change is Reviewable

Summary by CodeRabbit

  • New Features

    • Added skipLocal as an alias for the --skip-local CLI flag
  • Improvements

    • Enhanced skip_local functionality to correctly skip clones within the same directory structure
    • Improved file path normalization for consistent detection behavior across different scan root configurations

The Rust --skip-local implementation compared immediate parent directories
(Path::parent()), while the TypeScript SkipLocalValidator checks if both
files are under the same scan root directory. This meant clones across
different subdirectories of the same project were not being filtered.

Changes:
- cpd-core/detect.rs: Replace parent-dir comparison with should_skip_local()
  that checks if both files share a scan root, mirroring the TS validator.
  Thread scan_roots through detect_with_options, detect_prepared,
  detect_in_group, flush_clone, add_secondary_clones, flush_secondary_clone.
- cpd-finder/orchestrate.rs: Canonicalize both scan roots and file IDs so
  prefix comparisons work across macOS symlink differences (/var vs /private/var).
  Pass scan_roots to detect_prepared.
- cpd/cli.rs: Add --skipLocal as a visible alias for --skip-local for
  compatibility with the TypeScript CLI flag.
- skip_local_integration.rs: Fix cross_directory test to use two separate
  scan roots (matching TS semantics). Add same_root_subdirectories test
  verifying clones within one root are skipped even across subdirs.
@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

skip_local clone suppression is overhauled: new should_skip_local and is_relative_to helpers replace the old parent-directory equality check by testing whether both clone fragment paths share the same scan root. A scan_roots: &[PathBuf] parameter is threaded through detect_with_options, detect_prepared, and all internal detection helpers. The orchestrator canonicalizes both file ids and scan roots before calling detection. Integration tests cover the new subdirectory behavior, and --skip-local gains a skipLocal camelCase alias.

Changes

scan-root-relative skip_local overhaul

Layer / File(s) Summary
should_skip_local / is_relative_to helpers and flush-function usage
rust/crates/cpd-core/src/detect.rs
Introduces should_skip_local and is_relative_to helpers that test whether both clone fragment paths share a scan root; replaces the old parent-directory equality check in flush_clone and flush_secondary_clone with calls to these helpers.
scan_roots threaded through detection pipeline signatures
rust/crates/cpd-core/src/detect.rs
Extends detect_with_options, detect_prepared, detect_in_group, flush_clone, add_secondary_clones, and flush_secondary_clone with a scan_roots: &[PathBuf] parameter; adds PathBuf to imports; updates every internal call site.
Path canonicalization and scan_roots passing in orchestrator
rust/crates/cpd-finder/src/orchestrate.rs
File ids are computed via canonicalize() with fallback; scan roots are pre-canonicalized before detection; detect_prepared is called with the resulting &scan_roots.
Integration tests and CLI alias
rust/crates/cpd-finder/tests/skip_local_integration.rs, rust/crates/cpd/src/cli.rs
Adds same_root_subdirectories_skipped_by_skip_local test; updates cross_directory_clones_survive_skip_local to use two subdirectories as separate scan roots; adds skipLocal as a visible CLI alias for --skip-local.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 Hopping through the code roots, sniffing every tree,
Two files in one burrow? No clones for me!
is_relative_to — my trusty nose,
Sniffs out same-root siblings, and away the duplicate goes.
--skipLocal or --skip-local, I answer to both,
One rabbit, two names — by the patch-clover oath! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the main fix: replacing parent directory comparison with scan root-based logic for --skip-local behavior.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/skip-local-scan-roots

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rust/crates/cpd-core/src/detect.rs`:
- Around line 369-402: The is_relative_to helper function compares raw path
components without normalization, causing false negatives when paths use
different spellings like ./repo/src/a.js versus repo, or paths with ..
components. Normalize both the file_path and dir parameters before performing
any prefix checks and ancestor walking in the is_relative_to function. This
ensures that semantically equivalent paths with different spellings are
correctly identified as being under the same root, matching the behavior of the
TypeScript path.relative() logic referenced in the documentation comment.

In `@rust/crates/cpd-finder/src/orchestrate.rs`:
- Around line 183-188: The current code canonicalizes the file path and uses it
as the `id`, which changes the public output contract unexpectedly. Instead,
preserve the original discovered/display path from file.path for the exported
`id` to maintain the current contract, and create a separate
normalized/canonical path variable for internal comparisons like skip_local
checks. Update the logic to use the original path for the id assignment and the
canonicalized path only where root comparisons are needed.

In `@rust/crates/cpd/src/cli.rs`:
- Around line 180-182: Update the doc comment for the skip_local field to
accurately reflect the current behavior. The field's help text currently states
that it skips clones in the same directory, but the actual behavior now skips
any pair that shares a scan root (including different subdirectories under one
scanned path). Modify the comment for the skip_local boolean field to describe
this broader scope of skipping pairs that share a scan root rather than just
same-directory pairs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dc795894-dd6d-4f1d-b822-a86ab92bc7fb

📥 Commits

Reviewing files that changed from the base of the PR and between 26df7f1 and 74408be.

📒 Files selected for processing (4)
  • rust/crates/cpd-core/src/detect.rs
  • rust/crates/cpd-finder/src/orchestrate.rs
  • rust/crates/cpd-finder/tests/skip_local_integration.rs
  • rust/crates/cpd/src/cli.rs

Comment on lines +369 to +402
fn should_skip_local(file_a: &str, file_b: &str, scan_roots: &[PathBuf]) -> bool {
scan_roots
.iter()
.any(|root| is_relative_to(file_a, root) && is_relative_to(file_b, root))
}

/// Returns true if `file_path` is contained within `dir`.
/// Mirrors the TypeScript `SkipLocalValidator.isRelative`:
/// `const rel = relative(dir, file); return rel !== '' && !rel.startsWith('..') && !isAbsolute(rel);`
fn is_relative_to(file_path: &str, dir: &PathBuf) -> bool {
let file = Path::new(file_path);
// Fast path: file path starts with the dir prefix
if let Ok(rel) = file.strip_prefix(dir) {
return !rel.as_os_str().is_empty();
}
// Mixed absolute/relative can never match via simple prefix
if file.is_absolute() != dir.is_absolute() {
return false;
}
// Walk up from the file path checking if any ancestor starts with dir
let mut ancestor = file;
loop {
if ancestor == dir.as_path() {
return false;
}
if ancestor.starts_with(dir) {
let rel = ancestor.strip_prefix(dir).unwrap_or(ancestor);
return !rel.as_os_str().is_empty();
}
ancestor = match ancestor.parent() {
Some(p) => p,
None => return false,
};
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize both sides before is_relative_to compares them.

This helper only compares raw path components, so spellings like ./repo/src/a.js vs repo or repo/../repo/lib/b.js return false here even though the TypeScript path.relative() check referenced in the doc comment would still treat both files as under the same root. orchestrate.rs canonicalizes the CLI path, but detect_with_options and detect_prepared are public entry points, so skip_local still depends on caller-specific path spelling unless this helper normalizes lexically or the API explicitly requires normalized IDs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/cpd-core/src/detect.rs` around lines 369 - 402, The
is_relative_to helper function compares raw path components without
normalization, causing false negatives when paths use different spellings like
./repo/src/a.js versus repo, or paths with .. components. Normalize both the
file_path and dir parameters before performing any prefix checks and ancestor
walking in the is_relative_to function. This ensures that semantically
equivalent paths with different spellings are correctly identified as being
under the same root, matching the behavior of the TypeScript path.relative()
logic referenced in the documentation comment.

Comment on lines +183 to +188
let id = file
.path
.canonicalize()
.unwrap_or_else(|_| file.path.clone())
.to_string_lossy()
.into_owned();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Keep normalization out of the exported id.

This id now feeds both RunResult.sources[*].id and the clone fragment source_ids produced later in cpd-core, and RunConfig has no flag here to opt callers into absolute/canonical paths. Canonicalizing the emitted identifier changes the public output contract just to make skip_local comparisons work. Preserve the discovered/display path for output and carry a separate normalized path for root comparisons.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/cpd-finder/src/orchestrate.rs` around lines 183 - 188, The
current code canonicalizes the file path and uses it as the `id`, which changes
the public output contract unexpectedly. Instead, preserve the original
discovered/display path from file.path for the exported `id` to maintain the
current contract, and create a separate normalized/canonical path variable for
internal comparisons like skip_local checks. Update the logic to use the
original path for the id assignment and the canonicalized path only where root
comparisons are needed.

Comment on lines 180 to 182
/// Skip clones where both fragments are in the same directory
#[arg(long)]
#[arg(long, visible_alias = "skipLocal")]
pub skip_local: bool,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the help text to match the new skip_local semantics.

The detector no longer skips only same-directory pairs; it now drops any pair that shares a scan root, including different subdirectories under one scanned path. Leaving the old wording here makes --help describe the pre-change behavior.

Suggested text
-    /// Skip clones where both fragments are in the same directory
+    /// Skip clones where both fragments are under the same scan root
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Skip clones where both fragments are in the same directory
#[arg(long)]
#[arg(long, visible_alias = "skipLocal")]
pub skip_local: bool,
/// Skip clones where both fragments are under the same scan root
#[arg(long, visible_alias = "skipLocal")]
pub skip_local: bool,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust/crates/cpd/src/cli.rs` around lines 180 - 182, Update the doc comment
for the skip_local field to accurately reflect the current behavior. The field's
help text currently states that it skips clones in the same directory, but the
actual behavior now skips any pair that shares a scan root (including different
subdirectories under one scanned path). Modify the comment for the skip_local
boolean field to describe this broader scope of skipping pairs that share a scan
root rather than just same-directory pairs.

@kucherenko kucherenko merged commit abadd9f into master Jun 14, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant