This repo contains a zero-dependency Rust crate that canonicalizes paths when suffixes don’t exist. It must match std::fs::canonicalize exactly for fully-existing paths, while extending behavior to non-existing paths safely and predictably across Windows, macOS, and Linux.
Use this guide when proposing changes, refactors, tests, or docs with an automated agent.
AGENTS.md is read by stateless agents with no memory of prior sessions. Every rule must stand on its own without session context.
- General, not reactive. Do not add rules to address a single past mistake. Only codify patterns that could recur across sessions.
- Context-free. No references to specific conversations, resolved issues, commit hashes, or session artifacts. A future agent must understand the rule without knowing what prompted it.
- Principles over examples. Prefer abstract guidance. If an example is needed, make it generic — never name a specific module or function as the motivating case.
- No stale specifics. If a rule names a concrete item (file, function, feature), it must be because the item is structurally important (e.g. the repository layout table), not because it was the subject of a past debate.
- Compatibility: Results for fully-existing paths must equal
std::fs::canonicalize. - Minimal deps: Keep mandatory runtime dependencies at 0. Optional features may add well-justified lightweight deps (currently:
proc-canonicalizedefault-enabled for Linux/procmagic-link handling;dunceWindows-only optional for UNC simplification). Dev-onlytempfileis allowed in tests. - Security first: Preserve ADS validation, symlink cycle detection, null-byte checks, traversal clamping, and UNC/device semantics.
- MSRV: Keep Minimum Supported Rust Version at
1.70.0(edition 2021; no unstable features). - CI clean:
cargo fmt,clippy -D warnings, tests, docs (rustdoc-D warnings), security audit, and MSRV all pass locally.
pub fn soft_canonicalize(path: impl AsRef<std::path::Path>) -> std::io::Result<std::path::PathBuf>pub const MAX_SYMLINK_DEPTH: usizepub struct SoftCanonicalizeError { path: PathBuf, detail: Cow<'static, str> }pub trait IoErrorPathExt { fn offending_path(&self) -> Option<&Path>; fn soft_canon_detail(&self) -> Option<&str>; }
Do not change signatures or remove items without a clear migration plan and tests.
- Existing paths: Return exactly what
std::fs::canonicalizereturns (including Windows case/UNC long-path formatting). - Non-existing suffixes: Canonicalize only the deepest existing ancestor; append the non-existing tail lexically.
- Traversal semantics: Resolve
./..lexically with symlink-first behavior for existing links; never pop beyond root/share/device floors. - Symlinks: Resolve with bounded depth (
MAX_SYMLINK_DEPTH) and cycle detection; preserve attachment semantics if the resolved target and parent don’t exist. - Windows specifics:
- Return extended-length paths for absolute results (
\\?\C:\...or\\?\UNC\server\share\...). - Preserve 8.3 short names in non-existing components; expand only when the component exists and is probed.
- Validate NTFS ADS placement and token rules (early and late checks) and reject malformed patterns.
- Return extended-length paths for absolute results (
- Input validation: Reject embedded NUL bytes consistently.
- Performance: Prefer single-pass logic, minimal syscalls, no unnecessary allocations.
- Input checks → 2) Relativeto-absolute → 3) Fast-path
fs::canonicalize(original) → 4) Lexical normalize (streaming) → 5) Fast-pathfs::canonicalize(normalized if changed) → 6) Deepest-existing-prefix discovery with inline symlink handling → 7) Optional re-canonicalize anchor when symlink seen (or Windows short-name expansion) → 8) Append non-existing suffix → 9) Windows extended-length normalization.
src/lib.rs: Core algorithm, Windows/Unix branches, helpers, internal tests modules.src/anchored.rs:anchored_canonicalizeimplementation (compiled only with--features anchored).src/tests/: Unit tests grouped by area (std-compat, traversal, symlink, platform, security, etc.).tests/: Integration and blackbox security tests, including Windows ADS and UNC coverage.examples/: Runnable examples and demos, including security demonstration.benches/: Benchmarks and Python baseline harness (benches/python).docs/: Deep-dives (e.g., Windows UNC research).- CI helpers:
ci-local.shandci-local.ps1replicate GitHub Actions locally.
- Bash (Linux/macOS/WSL):
bash ci-local.sh - PowerShell (Windows):
.\ci-local.ps1
These scripts:
- Check UTF-8 encodings and BOM for critical files.
- Run
cargo fmt --check,clippy -D warnings,cargo test --verbose(includes doctests), andcargo docwithRUSTDOCFLAGS='-D warnings'. - Test feature combinations explicitly:
--features anchoredand--features anchored,dunce(NOT--all-features). - Run
cargo audit(install if missing). - Verify MSRV by building and linting on Rust 1.70.0 (regenerates
Cargo.lockas needed).
Feature combinations to test explicitly:
cargo test --features anchored- Primary use case (anchored canonicalization only)cargo test --features anchored,dunce- Full feature set (anchored + Windows path simplification)
Can now use --all-features for testing on all platforms. The dunce feature is properly guarded with #[cfg(all(feature = "dunce", windows))] in all code that directly uses dunce:: functions.
Important: The dunce feature changes output format on Windows (UNC \\?\C:\... → simplified C:\...), so tests MUST use feature-conditional assertions with #[cfg(feature = "dunce")] blocks when comparing with std::fs::canonicalize (which always returns UNC format).
Critical rule for code: Any code that calls dunce:: functions directly MUST use #[cfg(all(feature = "dunce", windows))], not just #[cfg(feature = "dunce")].
Platform-specific feature testing:
- dunce on Linux/Unix: The dunce feature is a Windows-only dependency (target-conditional in Cargo.toml). On non-Windows platforms, it adds no dependencies and has no effect. Therefore, testing
--features anchored,dunceon Linux is redundant—it behaves identically to--features anchored. The CI pipeline does NOT need to test dunce on Linux. - Cross-platform path handling: While dunce doesn't need Linux testing, we DO test how the crate handles Windows-style paths (UNC, backslashes, drive letters) on Unix to ensure graceful behavior in cross-platform scenarios (see
tests/cross_platform_paths.rs). These tests verify that:- Windows UNC paths fail gracefully on Unix (no panics)
- Windows drive letters are handled predictably on Unix
- Unix-style forward slashes work on Windows (Windows accepts / as separator)
- Relative paths and dot-dot resolution work consistently across platforms
Rationale: Explicit feature flags make CI intentions clear and catch feature-specific issues. The ambiguous --all-features hides which combinations are being tested and makes failures harder to debug.
Problem: On Windows, creating symlinks requires elevated privileges or Developer Mode. This creates a dangerous testing blind spot:
- Locally: Tests that create symlinks skip gracefully with error 1314 (
ERROR_PRIVILEGE_NOT_HELD) - CI (GitHub Actions): Windows runners have symlink privileges enabled, so these tests RUN
Why This Matters:
- Tests that skip locally will execute on GitHub Actions
- If these tests compare with
std::fs::canonicalizewithout feature-conditional assertions, they will FAIL in CI when dunce feature is enabled - You won't catch these failures until after you push to GitHub
A test that merely skips on error 1314 is a CI-only test: the developer never sees it pass on their own machine and cannot reproduce a CI failure locally. When the behavior under test does not require true symlink semantics (reparse-point resolution is sufficient), use an NTFS junction as a fallback so the test runs on non-admin Windows sessions too.
Rules:
-
Use the existing helper, don't inline — a
create_symlink_or_junctionhelper already lives intests/test_helpers/. Integration tests undertests/must call that helper rather than re-implementing the try-symlink- then-junction pattern. If you find yourself writingmatch symlink_dir(...) { Err(e) if ... raw_os_error() == Some(1314) => junction_verbatim::create (...) }by hand in an integration test, stop and use the helper. -
src/tests/unit tests can't import the helper (different crate boundary). For those, either: (a) move the test totests/and use the helper, or (b) inline the two-arm fallback with thejunction-verbatimdev-dependency. Do NOT add a second module that re-exports a parallel helper; one canonical helper per repo. -
Junction semantic gotcha: junctions require an ABSOLUTE target and can only point at directories on the same volume. Symlinks accept both relative and absolute targets. When designing a test that must work via either mechanism, structure it with an absolute target so the same setup works for both — don't write two parallel tests for the "symlink only" and "junction only" cases.
-
When junction is not an acceptable fallback: tests that specifically exercise relative-symlink target resolution, or absolute symlink targets outside the anchor's volume, must remain symlink-only and skip on 1314. Document why junction won't satisfy the test in a short comment.
-
Do not replace or alter an existing regression test with a junction variant. Junction covers a different code path (absolute-target reparse point) than a relative symlink. If both paths matter, ADD a sibling test; never rename or rewrite the original. Regression tests are append-only — the name and assertions at release time must survive intact so future bisects can pinpoint behaviour changes.
-
Verify the test actually ran locally. A test that prints "skipping: symlink creation not permitted" and returns
Ok(())is not evidence of correctness. When reporting a fix, confirm via the test runner's output (... okwith execution time) that the test body executed. If every relevant test printed a skip line, you have no local proof — add a junction fallback (as a SIBLING test per rule 5) or run in an elevated / Developer-Mode session before claiming the fix works.
How to Identify These Tests: Search for these patterns in test files:
# Tests that skip on permission errors
grep -r "PermissionDenied\|raw_os_error.*1314\|Skipping.*permission" tests/
grep -r "got_symlink_permission" tests/Mandatory Pattern for These Tests: ALL tests that:
- Check
PermissionDeniedorraw_os_error() == Some(1314) - Call
got_symlink_permission()helper - Would skip locally but run on GitHub Actions
MUST have feature-conditional assertions when comparing with std::fs::canonicalize:
#[test]
fn test_with_symlinks() -> std::io::Result<()> {
let tmpdir = tmpdir();
// Permission check - test skips locally but runs on CI
if !got_symlink_permission(&tmpdir) {
return Ok(());
}
// ... create symlinks ...
let result = soft_canonicalize(&path)?;
// CRITICAL: Must use feature-conditional assertion!
#[cfg(not(feature = "dunce"))]
{
assert_eq!(result, std::fs::canonicalize(&path)?);
}
#[cfg(feature = "dunce")]
{
let result_str = result.to_string_lossy();
let std_str = std::fs::canonicalize(&path)?.to_string_lossy();
assert!(!result_str.starts_with(r"\\?\"), "dunce should simplify");
assert!(std_str.starts_with(r"\\?\"), "std returns UNC");
assert_eq!(result_str.as_ref(), std_str.trim_start_matches(r"\\?\"));
}
Ok(())
}Files with Symlink-Skipping Tests (all must have feature guards):
tests/compat_symlinks.rs- Tests withgot_symlink_permission()tests/blackbox_toctou_attacks.rs- TOCTOU race condition teststests/blackbox_complex_attacks.rs- Complex attack vectorstests/issue_53_symlink_dotdot_lexical_collapse.rs- Regression test with symlink permission skiptests/windows_anchored_verbatim_drive_bug_symlink.rs- Windows anchored symlink testssrc/tests/symlink_dotdot_symlink_first.rs- Symlink-first resolution testssrc/tests/anchored_symlink_clamping.rs- Anchored symlink clamping testssrc/tests/anchored_security/windows_symlink.rs- Windows anchored security symlink tests
Use the grep commands in "How to Identify These Tests" above to find the authoritative current list — it is more reliable than this enumeration.
Before Committing:
- Search for ALL tests with symlink permission checks
- Verify each has feature-conditional assertions when comparing with
std::fs::canonicalize - Look for
assert_eq!,assert!(...starts_with(...)), and direct path comparisons - If unsure, add feature guards - they're safe even if not strictly needed
Remember: Tests that pass locally might fail on GitHub Actions if they lack proper feature guards!
Agents are only permitted to run read-only git commands. Never run any git command that modifies the working tree, index, or history. This includes, but is not limited to:
Banned (write) operations:
git add,git stagegit commit,git commit --amendgit restore,git checkout -- <file>git reset(any form)git stash,git stash popgit merge,git rebasegit push,git pull,git fetchgit rm,git mvgit tag,git branch -d
Allowed (read) operations:
git status,git diff,git diff --stagedgit log,git show,git blamegit ls-files,git stash list
If you need to stage, commit, or modify git state, ask the user to do it or wait for an explicit instruction. Never take git write actions on your own initiative, even to "clean up" or "fix" something you changed.
ALWAYS check staged files before committing. Before running git commit, you MUST:
- Run
git statusto see what files are staged vs unstaged - Run
git diff --staged --statto see exactly what will be committed - Review the staged changes - ensure they match the intended commit scope
- If unrelated files are staged, either:
- Unstage them with
git reset HEAD <file>before committing, OR - Ask the user if they should be included
- Unstage them with
Never blindly run git add <file>; git commit without checking what was already staged. The user may have staged files for a different purpose.
Commit message must match staged content. If the staged diff contains files unrelated to your commit message, STOP and clarify with the user.
Example workflow:
# WRONG - dangerous, ignores existing staged files
git add myfile.rs
git commit -m "fix: something"
# CORRECT - always check first
git status
git diff --staged --stat
# Review output, then if appropriate:
git add myfile.rs
git diff --staged --stat # Check again after adding
git commit -m "fix: something"- Style: Follow
rustfmtdefaults; keep code clear and small; avoid over-abstraction. - Error handling: Use
error_with_pathto attach offending path context; ensureSoftCanonicalizeError::detailis human-readable. - Allocation: Avoid temporary
Strings; preferPathBuf,OsString, and component streaming. - Syscalls: Minimize
metadata/canonicalizecalls; keep fast-paths and early exits intact. - Platform cfg: Keep Windows/Unix branches correct and side-effect free; don’t introduce behavioral drift between platforms.
- Dependencies: Do not add new runtime dependencies. The existing
proc-canonicalize(default feature) anddunce(Windows-only optional feature) are the approved optional deps. If you believe a new one is strictly necessary, open an issue first. - No
.unwrap()in production code: Production code must never call.unwrap(),.expect(), or any method that panics onNone/Err. Use?,.ok_or(),.map_err(), or.unwrap_or()instead. Test code may use.unwrap()freely. - No dead code: Do not use
#[allow(dead_code)]or similar lint-suppression attributes. If the compiler says it's unused, either use it or remove it. Fix the root cause instead of silencing the warning.
Production code must not use direct indexing (data[i], parts[1],
slice[start..end]) on slices, Vec, or str. Direct indexing panics on
out-of-bounds access, which is a denial-of-service vector.
Required replacements:
| Banned | Replacement |
|---|---|
parts[i] |
parts.get(i).ok_or(…)? or parts.get(i).map(…) |
data[start..end] |
data.get(start..end).ok_or(…)? |
slice[i..] |
slice.get(i..).unwrap_or_default() |
For sequential processing, prefer iterators (.iter(), .enumerate(),
.windows(), .chunks(), .split()) over index-based loops.
Test code (#[cfg(test)] blocks, tests/) may use direct indexing when
the test controls the input and panic-on-bug is acceptable.
Hot-path functions (path component iteration, validation checks, normalization
helpers) must not heap-allocate. Use stack buffers, iterators, and streaming
operations instead of intermediate Vec, String, or Box.
For necessary allocations (variable-length output):
- Use
Vec::with_capacity(known_size)to avoid reallocation. - Prefer
Vec::extend_from_sliceover N ×pushfor bulk copies.
- Prefer
Option/Resultover sentinel values. Never use empty strings,-1, or null-equivalent magic values to signal absence. - Prefer
matchoverif letwhen handling enums so that adding a new variant produces a compile error at every call site, rather than silently falling through. - Keep struct fields private when invariants must be enforced. Expose transition methods that enforce them.
Every named lifetime parameter must have a descriptive name that explains
whose lifetime it represents. Single-letter lifetimes ('a, 'b, 'c,
...) are banned — no exceptions, no "simple signature" carve-out.
Name lifetimes after the data they bind to: 'path, 'input, 'src,
'buf, 'anchor, 'cfg, 'err. When a function takes two references,
give each one a name that identifies its source (e.g. fn f<'input, 'buf> (src: &'input str, dst: &'buf mut String)).
Exceptions (these are not "single-letter" names, they are language built-ins):
'static— Rust's built-in lifetime for program-long data. Use it when the borrow must outlive the process.'_— the elided / anonymous lifetime. Use it only where the compiler already infers the lifetime and naming it would add no information (e.g.fmt::Formatter<'_>). Prefer a real name whenever the lifetime appears in a function or type signature you author.
Why: lifetimes are a contract between the caller and the function about
who owns what for how long. A name like 'a forces every reader to
reverse-engineer that contract from the signature. A name like 'anchor
tells them instantly — the same way a well-named parameter does. Stale
single-letter names rot fastest: add a second lifetime and now 'a and
'b are a puzzle. Descriptive names never rot.
Comments must answer why: the reasoning, invariant, security property, non-obvious constraint, or history behind a workaround. Never comment what the code already says — well-named identifiers are the canonical "what".
- Good:
// SECURITY: clamp before join — raw target may contain ".." that// escape the anchor when the OS resolves the returned path. - Good:
// Fast-path: skip fs::canonicalize when lexical form is unchanged;// avoids a syscall in the hot case. - Bad:
// increment counterabovecounter += 1; - Bad:
// call the helperabove a function call.
When in doubt, add a short comment stating the invariant/reason. A future reader (human or agent) who asks "why is this here?" must find the answer in the code — not in a commit message, issue tracker, or vanished conversation. Delete comments that only restate identifiers.
Doc comments (///, //!) must never hide executable content from the test
harness. Forbidden fence styles in Rust doc comments (they are all
treated as test-skip or test-bypass mechanisms):
```text— blocks code from compiling. Use plain prose (no fence) or a bulleted list instead. For pseudocode illustrations, write them as prose.```ignore,```no_run,```should_panic,```compile_fail— block or redirect execution. Rewrite as a real runnable```rustblock thatcargo test --doccompiles and runs, or move the illustration into a regular#[test]and reference it from the doc comment.
Rules of thumb:
- Pseudocode → write it as prose (no fence).
- Runnable Rust → use the default
```or```rustfence and make it actually compile and run undercargo test --doc. - Private /
pub(crate)items: rustdoc does not execute their doctests, so a```rustblock there is a lie that cannot be verified. Use prose.
This discipline applies to every Rust source file. Plain Markdown files
(README.md, CONTRIBUTING.md, CHANGELOG.md) may use ```text freely —
they are not processed by rustdoc.
Keep source files under ~600 lines (production or test) to fit within a single LLM context window and improve RAG retrieval precision.
- When a production file grows past ~600 lines, split into focused submodules
(e.g.
foo.rs→foo/mod.rs+foo/helpers.rs). - When a test file grows past ~600 lines, split into thematic files
(e.g.
tests_validation.rs,tests_security.rs). - Favour a stable top-to-bottom layout so any reader knows where to look: module docs → imports → constants → types → impl blocks → functions → tests.
- For every non-trivial behavior change, bug fix, or regression fix: write or update the tests first so the expected behavior is explicit before implementation changes begin.
- The intended workflow is red → green → refactor:
- Encode the requirement in a test.
- Observe the old implementation fail or lack the behavior.
- Implement the change.
- Rerun the tests to prove the new behavior.
- If a task is purely structural (rename, move, formatting) and has no behavioral delta, a new failing test is not required.
- Every problem or bug fixed must include a regression test as part of the same change set.
Do not claim a feature or fix is complete without evidence:
- Tests (unit, integration, or doctests) proving the behavior.
- CI output showing clean build + test pass.
- Manual verification notes (if no automation exists yet).
"Implemented" or "fixed" without proof is not acceptable.
- Run
cargo teston all platforms you can. Many tests are platform-conditional (#[cfg(windows)], etc.). - Coverage areas include: std-compat, traversal, symlinks, Unicode/encoding, Windows UNC and 8.3, ADS validation, TOCTOU race robustness, null bytes, boundary conditions.
- When changing behavior, add focused tests alongside the changed logic:
- Unit tests under
src/tests/… - Integration tests under
tests/… - Example-based docs (doctests) when clarifying public behavior
- Unit tests under
- Keep tests deterministic and filesystem-safe; avoid relying on external shares or network state.
-
Prefer exact equality over hints:
- Do not use
starts_with/ends_withto “approximate” expected paths. Compute or state the full expected path andassert_eq!. - Windows: when asserting final absolute results, use extended-length expectations (e.g.,
\\?\C:\...) if applicable. - Positive-only: Main test assertions must validate the correct expected output, not enumerate incorrect forms. Avoid patterns like
assert!(!path.starts_with("\\?\"))for final correctness; instead assert equality with the properly transformed path. Negative checks may appear only in input precondition validation.
- Do not use
-
Build expected paths simply and readably:
- For inputs (what a user would type), use raw strings (e.g.,
r"hello\dir\..\world"). - For expected results, either:
- Use a single raw-string tail with one
join(e.g.,base.join(r"etc\passwd")), or - Compare against a full literal built via
format!andPathBuf::fromwhen you want to assert the entire string.
- Use a single raw-string tail with one
- Avoid long chains of
join("segment")unless necessary; keep tests human-readable.
- For inputs (what a user would type), use raw strings (e.g.,
-
Anchored semantics:
anchored_canonicalizesoft-canonicalizes the anchor internally. Do not pre-canonicalize anchors in examples unless demonstrating manual behavior.- Virtual filesystem semantics (v0.4.0+): The anchor acts as a virtual root. All symlinks (both absolute and relative) that resolve outside the anchor are clamped back into the virtual filesystem.
- Absolute symlinks: Reinterpreted relative to the anchor (e.g.,
/etc/passwd→anchor/etc/passwd) - Relative symlinks that escape: Clamped using common ancestor logic (e.g.,
../../opt/file→anchor/opt/file) - Lexical
..traversal: Always clamps to the anchor boundary
- Absolute symlinks: Reinterpreted relative to the anchor (e.g.,
- Write tests that affirm these clamping rules explicitly using exact path assertions.
-
Symlinks in tests:
- Unix: symlink creation is reliable; create real symlinks and assert exact resolved results.
- Windows: symlink creation requires privileges (error 1314 =
ERROR_PRIVILEGE_NOT_HELD).- Local testing policy: Tests should skip gracefully on symlink privilege errors. Local developers and
ci-localscripts run without elevated privileges, and these tests will be validated by GitHub Actions runners which have symlink privileges enabled. - Test implementation patterns:
- Regression/behavior tests: Always skip gracefully on error 1314 with a clear message (e.g., "skipping: symlink creation not permitted"). These are the majority of tests and must not fail locally.
- Diagnostic tests: May panic on error 1314 to inform developers that the diagnostic requires privileges to run locally. These are debugging tools, not regular tests.
- CI environment: GitHub Actions Windows runners have symlink privileges enabled, so all symlink tests (both regression and diagnostic) will execute fully in CI.
- Local testing policy: Tests should skip gracefully on symlink privilege errors. Local developers and
-
Feature-conditional assertions (dunce feature):
- IMPORTANT: The dunce feature is Windows-only (target-conditional dependency in Cargo.toml). On non-Windows platforms, the feature is effectively disabled and adds no dependencies.
- Testing pattern: Tests comparing results with
std::fs::canonicalizeMUST use#[cfg(feature = "dunce")]guards for UNC-specific assertions. - CRITICAL: If test code directly calls
dunce::functions (not just our library functions), use#[cfg(all(feature = "dunce", windows))]to prevent compilation errors on non-Windows platforms. - Without dunce:
assert_eq!(result, std::fs::canonicalize(&path)?)- exact match expected (UNC on Windows, normal on Unix). - With dunce on Windows: Compare simplified result (no
\\?\prefix) with stripped version of std's UNC output. - Recommended pattern (positive-only final assertion):
#[cfg(not(feature = "dunce"))] { assert_eq!(result, std::fs::canonicalize(&path)?); } #[cfg(feature = "dunce")] { let result_str = result.to_string_lossy(); let std_str = std::fs::canonicalize(&path)?.to_string_lossy(); // Positive-only: equality with simplified expected (std UNC minus verbatim prefix) assert_eq!(result_str.as_ref(), std_str.trim_start_matches(r"\?")); }
- **Cleaner alternative using macro** (optional, for tests with many repetitive comparisons; positive-only): ```rust // Define at top of test file (use sparingly - explicit patterns are more debuggable) macro_rules! assert_std_compat { ($result:expr, $path:expr) => { #[cfg(not(feature = "dunce"))] { assert_eq!($result, std::fs::canonicalize(&$path)?); } #[cfg(feature = "dunce")] { let result_str = $result.to_string_lossy(); let std_str = std::fs::canonicalize(&$path)?.to_string_lossy(); assert_eq!(result_str.as_ref(), std_str.trim_start_matches(r"\\?\")); } }; } // Usage in test let result = soft_canonicalize(&path)?; assert_std_compat!(result, path);- If calling dunce directly in tests (e.g., for building expected paths):
// CORRECT: Platform guard when calling dunce:: directly #[cfg(all(feature = "dunce", windows))] let expected = dunce::canonicalize(&path)?; #[cfg(not(feature = "dunce"))] let expected = std::fs::canonicalize(&path)?; // WRONG: Will fail on Linux with --all-features #[cfg(feature = "dunce")] let expected = dunce::canonicalize(&path)?; // ❌ dunce not available on Linux
-
Environment assumptions:
- Do not depend on global machine directories (e.g.,
C:\\Users) unless you defend with a skip or your assertion is valid for non-existing paths as well. - Prefer
TempDir-based fixtures; avoid network paths and external shares.
- Do not depend on global machine directories (e.g.,
-
Examples of good assertions:
- Full equality under an anchored base:
let base = soft_canonicalize(&anchor)?; let out = anchored_canonicalize(&base, r"c\d\e.txt")?; assert_eq!(out, base.join(r"c\d\e.txt"));
- Literal Windows expectation (non-existing is OK):
let anchor = r"C:\\Users\\non-existing\\dir1\\dir2\\..\\..\\folder"; let out = anchored_canonicalize(anchor, r"hello\\world")?; assert_eq!(out, std::path::PathBuf::from(r"\\?\C:\\Users\\non-existing\\folder\\hello\\world"));
- Relative symlink keeps clamp (Windows example fails on local machines without privileges but runs on GitHub Actions): ensure equality with exact expected path, not hints.
- Full equality under an anchored base:
-
Virtual vs system paths (for downstream crates):
- If a downstream crate exposes a “virtual” display that’s lexical, assert lexical results there.
- For symlink-resolved system paths, use our
anchored_canonicalizewith a canonicalized anchor and assert the fully-resolvedPathBuf.
- Run subset benches locally via
cargo bench. - Python baseline lives in
benches/python/python_fair_comparison.py; requires a system Python (python|python3|py). - Python baseline lives in
benches/python/python_fair_comparison.py; the harness preferspython3.13when available to match latest Python on Linux/WSL, then falls back topython,python3, orpy. - Bench numbers are environment-dependent; only use them as trend indicators.
- Do not regress performance by adding extra syscalls or full-path canonicalizations—justify any changes with comments and tests.
When asked to “run benches,” use this exact 5-run protocol and report medians from the mixed-workload benchmark (benches/performance_comparison.rs). The benchmark itself invokes the Python baseline automatically when Python is available.
Requirements:
- Windows: PowerShell, Rust toolchain installed and on PATH.
- Linux/WSL/macOS: Bash with Rust toolchain installed.
- Python available as
python,python3, orpyfor the baseline (optional but recommended).
Windows (PowerShell):
- From repo root, run this 5-run loop and capture logs:
- for ($i=1; $i -le 5; $i++) { cargo bench | Tee-Object -FilePath "target\bench-windows-$i.txt" }
- From each run, extract the value printed by performance_comparison as:
- "Rust soft_canonicalize : paths/s"
- Sort the five numbers and take the middle one (median). Report that as “Windows median (paths/s)”.
Linux/WSL (Bash):
- If running from Windows, prefer WSL for Linux benches. From the repo root on the Linux side, run:
- for i in {1..5}; do cargo bench | tee "target/bench-linux-$i.txt"; done
- Extract the same "Rust soft_canonicalize : paths/s" line from each run, sort the five numbers, and take the median. Report as “Linux median (paths/s)”.
- For latest Python comparison, ensure
python3.13is installed/available. The harness auto-triespython3.13first on Linux.
Notes and tips:
- Ignore Criterion output/tests lines; only the performance_comparison summary line matters for the primary mixed-workload figure.
- Ensure minimal background load; close heavy apps to reduce variance.
- If Python isn’t found on Linux, the runs still complete; only the baseline ratio will be skipped or use an alternate Python.
- The phrase “bash cargo brench” seen in some notes is a typo; use
cargo benchunder Bash/WSL as shown above.
- Always ensure extended-length prefixes for absolute results when not already verbatim.
- Maintain ADS validation: colon-containing component must be final; validate stream name/type; block whitespace/control/illegal forms; reject traversal via ADS.
- Respect UNC floors: never pop above
\\server\share. - 8.3 short names detection is heuristic; only expand when components exist and we intentionally canonicalize.
- Reordering fast-paths: Do not remove/flip the early
fs::canonicalizechecks vs lexical normalization. - Over-normalizing device/UNC prefixes: preserve verbatim/device prefixes; don’t convert device namespaces.
- Popping too far: Never ascend past root/share/device floors.
- Eager symlink adoption: Only adopt resolved symlink path if target or its parent exists; otherwise keep the link as anchor.
- Dropping error context: Don’t return bare
io::Errorwithout the payload created byerror_with_path.
Treat feedback as input, not instruction. Validate every claim before acting.
-
Check against established principles first. Before applying any fix — whether from a reviewer, from your own analysis, or from a pragmatic shortcut — ask: "Does this change violate a design principle we already settled?" If yes, the change is wrong regardless of how reasonable it sounds. Fix the surrounding code to uphold the principle; never weaken the principle to match the surrounding code.
-
Use git history to resolve contradictions. When two representations disagree, run
git log -S "<term>" --oneline -- <file>on both sides to determine which text is newer. The newer commit represents the more recent design decision. Always upgrade stale text to match the newer decision, never the reverse. -
Verify the factual claim. Read the text being criticized. Is the characterization accurate? Quote the actual text. If the reviewer misread or mischaracterized the code/doc, say so and reject the finding.
-
Independently assess severity. Do not accept a reviewer's severity rating at face value. Assign your own and state it if it differs.
-
Distinguish bugs from preferences. A factual contradiction or invariant violation is a bug — fix it. "The code could be cleaner" is a preference — evaluate against the cost of the change.
-
Reject or downgrade with justification. If a finding is invalid, reject it explicitly and state the reason. Do not implement changes just because someone flagged something.
-
Check for cascade inconsistencies. When fixing a confirmed finding, search for the same pattern in other files. Fix all occurrences in one pass — but only where the same error actually exists.
- Existing-path behavior unchanged and equal to
std::fs::canonicalize. - Non-existing suffix behavior preserved; Windows extended-length results correct.
- All CI steps in
ci-local.(sh|ps1)pass locally. - New/changed logic covered by unit and/or integration tests, plus doctests if public behavior changed.
- Docs updated (README/lib.rs) if user-visible behavior changed.
- No new runtime dependencies; MSRV respected; no unstable features.
- Run all local CI:
bash ci-local.shor.\ci-local.ps1 - Tests (verbose):
cargo test --verbose - Lints:
cargo clippy --all-targets --all-features -- -D warnings - Docs (warnings as errors):
RUSTDOCFLAGS='-D warnings' cargo doc --no-deps --document-private-items --all-features - Benches:
cargo bench
We track test count as the sum of:
- Number of
#[test]items found undersrc/andtests/folders - Plus the number of Rust doc tests
Important: Doc tests must be runnable. See "Doc Comment Discipline"
under Coding Guidelines for the authoritative rule — in short, the fences
```text, no_run, ignore, should_panic, and compile_fail are
all banned in Rust doc comments. All doc tests must compile and run
successfully as part of cargo test.
Commands to count tests:
PowerShell (Windows):
# Count #[test] in src/ and tests/
$unit = (Get-ChildItem -Recurse -Path src, tests -Include *.rs | Select-String -Pattern '#\s*\[\s*test\s*\]')
$unit.Count
# Count doc tests
(cargo test --doc -- --list | Select-String -Pattern '^test ').CountBash (Linux/macOS/WSL):
# Count #[test] in src/ and tests/
grep -REo '#[[:space:]]*\[[[:space:]]*test[[:space:]]*\]' src tests | wc -l
# Count doc tests
cargo test --doc -- --list | grep '^test ' | wc -lWhen documenting test count, use the sum of both numbers. Update README.md and other docs with dynamic counts rather than hardcoded numbers.
Use when spinning up an automated change:
"""
Work on soft-canonicalize. Constraints: no new runtime deps (approved deps: proc-canonicalize default feature, dunce Windows-only optional); preserve exact parity with std::fs::canonicalize for fully-existing paths; extend behavior to non-existing suffixes only; keep MSRV 1.70; pass clippy -D warnings and rustdoc -D warnings; run bash ci-local.sh or .\ci-local.ps1 before proposing changes. Never remove tests or weaken ADS/symlink/UNC/.. protections. Add focused tests for any behavior you touch.
"""
- Tag as
vX.Y.Zto trigger publish and GitHub Release via workflows. - Update
CHANGELOG.mdwith clear, user-facing notes and security/perf impact.
—
If anything in this guide appears to conflict with the existing tests, treat the tests as the source of truth and open an issue to correct the guide.