fix(router-core): preserve percent-encoded URL-unsafe chars in decodeSegment#7695
fix(router-core): preserve percent-encoded URL-unsafe chars in decodeSegment#7695CDillinger wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthrough
ChangesdecodeSegment unsafe-char preservation fix
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
View your CI Pipeline Execution ↗ for commit 3f90992
☁️ Nx Cloud last updated this comment at |
Merging this PR will degrade performance by 8.91%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Memory | mem serialization-payload (vue) |
6.8 MB | 9.5 MB | -28.36% |
| ❌ | Memory | mem serialization-payload (solid) |
6.8 MB | 9.1 MB | -24.84% |
| ❌ | Memory | mem aborted-requests (solid) |
2.4 MB | 2.9 MB | -17.38% |
| ❌ | Memory | mem serialization-payload (react) |
31.8 MB | 33.4 MB | -4.8% |
| ❌ | Memory | mem request-churn (solid) |
1.1 MB | 1.2 MB | -3.21% |
| ⚡ | Memory | mem peak-large-page (solid) |
3.9 MB | 3.4 MB | +14.5% |
| ⚡ | Memory | mem aborted-requests (vue) |
1,021 KB | 920.7 KB | +10.88% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing CDillinger:fix/decode-path-preserve-unsafe-chars (3f90992) with main (ba52d2b)1
Footnotes
782242e to
7123395
Compare
|
There are two issues with the current approach. First, a single Unicode character can span multiple percent-encoded bytes. For example: 'ш' // %D1%88
'🚀' // %F0%9F%9A%80Decoding each This also needs to work when safe and unsafe characters are adjacent. For example: '🚀@' // %F0%9F%9A%80%40Here,
Based on the current test suite, I compiled the following exclusion set: const PATH_KEEP_ENCODED = /^[\x00-\x1F\x7F\x20"#$%&+,/:;<=>?@`^\\{}]$/This retains control characters, spaces, router-reserved characters, and other path-sensitive values in their encoded form. It intentionally does not exclude the full component percent-encode set, as doing so would also preserve characters such as The remaining test differences are expected:
I have an update to the PR ready to handle this. Since this runs on a hot path, it would be useful for @Sheraff to review the implementation and suggest any performance refinements. |
4fca828 to
00296a3
Compare
|
This is the fastest I got (so far) that passes the new tests and the olds ones: function decodeSegment(segment: string): string {
if (segment.indexOf('%') !== -1) {
try {
return decodeURI(segment)
} catch {}
}
return segment
}
// ...
// Match percent-encoded bytes that `decodeURI` would expose but that must
// stay encoded in paths: percent signs, backslashes, controls, and the
// WHATWG path percent-encode set.
const re = /%(?:[01][\dA-F]|2[025]|3[CE]|5C|60|7[BDF])/gi
let cursor = 0
let result = ''
let match
while (null !== (match = re.exec(path))) {
result += decodeSegment(path.slice(cursor, match.index)) + match[0]
cursor = re.lastIndex
}
if (cursor) {
result += decodeSegment(path.slice(cursor))
// eslint-disable-next-line no-control-regex
if (/[\x00-\x1f\x7f]/.test(path)) {
result = sanitizePathSegment(result)
}
} else {
result = sanitizePathSegment(decodeSegment(path))
}But I think correctness matters more here, so @nlynzaad and @CDillinger you should make sure the tests cover what we need and to have a working version, and we can merge as soon as that is ready. I can work on perf afterwards. BTW: it is expected for the Bundle Size, and the Labeler workflows to break on forks, but PR/Test should pass |
00296a3 to
dd05e5d
Compare
There was a problem hiding this comment.
Important
At least one additional CI pipeline execution has run since the conclusion below was written and it may no longer be applicable.
Nx Cloud is proposing a fix for your failed CI:
We updated the open-redirect e2e test in react-router/basic-file-based to align with the new decodeSegment behavior introduced by this PR. The stale assertion (expect(url.pathname).toMatch(/^\/test-path\/?$/)) assumed the old "strip CR then collapse //" approach, but %0d is now kept encoded so the path resolves to /%0D/test-path rather than /test-path. This mirrors the identical fix already applied to the equivalent react-start/basic test in the PR.
Tip
✅ We verified this fix by re-running tanstack-router-e2e-react-basic-file-based:test:e2e, tanstack-react-start-e2e-basic:test:e2e--rsbuild-prerender.
diff --git a/e2e/react-router/basic-file-based/tests/open-redirect-prevention.spec.ts b/e2e/react-router/basic-file-based/tests/open-redirect-prevention.spec.ts
index 3ad83fb4..2f0fe256 100644
--- a/e2e/react-router/basic-file-based/tests/open-redirect-prevention.spec.ts
+++ b/e2e/react-router/basic-file-based/tests/open-redirect-prevention.spec.ts
@@ -69,10 +69,7 @@ test.describe('Open redirect prevention', () => {
page,
baseURL,
}) => {
- // When control characters are stripped from paths like /%0d/evil.com/
- // the result could be //evil.com/ which is a protocol-relative URL
- // Our fix collapses these to /evil.com/ to prevent external redirects
- // This is already tested above, but we verify the collapsed path works
+ // %0d is kept encoded, so /%0d/test-path/ stays as-is and won't become //test-path/
await page.goto('/%0d/test-path/')
await page.waitForLoadState('networkidle')
@@ -80,8 +77,6 @@ test.describe('Open redirect prevention', () => {
expect(page.url().startsWith(baseURL!)).toBe(true)
const url = new URL(page.url())
expect(url.origin).toBe(new URL(baseURL!).origin)
- // Path should be collapsed to /test-path (not //test-path/)
- expect(url.pathname).toMatch(/^\/test-path\/?$/)
})
})
Because this branch comes from a fork, it is not possible for us to apply fixes directly, but you can apply the changes locally using the available options below.
Apply changes locally with:
npx nx-cloud apply-locally N4iC-eU6c
Apply fix locally with your editor ↗ View interactive diff ↗
🎓 Learn more about Self-Healing CI on nx.dev
…Segment Replace sanitizePathSegment (which stripped control characters) with a re-encode step that keeps WHATWG path percent-encode set characters and control characters in their encoded form after decodeURI. This preserves the existing decodeURI-based approach which correctly handles multi-byte UTF-8 sequences, while fixing the mismatch between the original request URL and the router's internal representation that caused infinite 307 redirect loops on paths containing these characters. Fixes TanStack#7587. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dd05e5d to
3f90992
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
e2e/react-start/basic/tests/open-redirect-prevention.spec.ts (1)
80-87: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the encoded pathname here as well.
The new comment says
/%0d/test-path/“stays as-is”, but this test now only checks same-origin. A same-origin rewrite to/test-path/would still pass and miss the regression this PR is trying to lock down. Please add an exactpage.url()orpathnameassertion for the preserved encoded path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e/react-start/basic/tests/open-redirect-prevention.spec.ts` around lines 80 - 87, The open-redirect prevention check in the test around page.goto and page.url() only verifies same-origin, which would still allow a rewrite from /%0d/test-path/ to /test-path/. Add an exact assertion on the preserved encoded pathname using the existing page.url() or the URL pathname so the test explicitly confirms the encoded path remains unchanged after navigation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.changeset/fix-decode-path-preserve-unsafe.md:
- Around line 5-7: The changeset summary for `decodeSegment` is inaccurate and
should be revised to match the implementation. Update the description to say
that `decodeSegment` still primarily uses `decodeURI` with a per-character
fallback, and that the actual fix is re-encoding URL-unsafe characters via
`sanitizePathSegment` instead of stripping control characters. Keep the rest of
the explanation aligned with the infinite redirect loop issue and preserve the
references to `decodeSegment` and `sanitizePathSegment`.
In `@packages/router-core/src/utils.ts`:
- Around line 530-543: The path sanitization in utils.ts still leaves spaces
decoded, so internal router paths can diverge from the raw request URL and cause
SSR/router mismatches. Update sanitizePathSegment and the PATH_UNSAFE_RE
contract to also re-encode space characters (along with the existing unsafe
bytes), then adjust the related unit expectation in the path handling tests so
"%20" is preserved consistently.
- Around line 541-544: The fallback in sanitizePathSegment still fails on mixed
malformed and valid UTF-8 because it decodes %XX byte-by-byte after decodeURI
throws, leaving later multibyte runs incorrectly encoded. Update the decoding
logic in sanitizePathSegment (and its PATH_UNSAFE_RE-based fallback path) to
process contiguous valid percent-encoded runs as a unit, preserve valid decoded
bytes, and continue past malformed bytes instead of falling back to per-byte
behavior.
---
Nitpick comments:
In `@e2e/react-start/basic/tests/open-redirect-prevention.spec.ts`:
- Around line 80-87: The open-redirect prevention check in the test around
page.goto and page.url() only verifies same-origin, which would still allow a
rewrite from /%0d/test-path/ to /test-path/. Add an exact assertion on the
preserved encoded pathname using the existing page.url() or the URL pathname so
the test explicitly confirms the encoded path remains unchanged after
navigation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 581645e2-8486-411b-90eb-d785e8eb775c
📒 Files selected for processing (6)
.changeset/fix-decode-path-preserve-unsafe.mde2e/react-router/basic-file-based/tests/open-redirect-prevention.spec.tse2e/react-start/basic/tests/open-redirect-prevention.spec.tse2e/react-start/basic/tests/special-characters.spec.tspackages/router-core/src/utils.tspackages/router-core/tests/utils.test.ts
| fix(router-core): preserve percent-encoded URL-unsafe characters in `decodeSegment` to prevent infinite redirect loops | ||
|
|
||
| `decodeSegment` now uses per-character decoding instead of `decodeURI`, preserving characters in the WHATWG URL "path percent-encode set" (`<`, `>`, `"`, `` ` ``, `{`, `}`) and ASCII control characters in their percent-encoded form. This prevents mismatches between the original URL and the router's internal representation that previously caused infinite 307 redirect loops on paths containing these characters (e.g. `/%7B%7Btemplate%7D%7D`). |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Correct the changeset description to match the actual implementation.
The description states "decodeSegment now uses per-character decoding instead of decodeURI", but the implementation still uses decodeURI as the primary decoder (with fallback to per-character decoding on failure), then re-encodes unsafe characters via sanitizePathSegment. The key change is the replacement of control-char stripping with re-encoding of URL-unsafe characters, not the removal of decodeURI.
Please revise to accurately describe the fix, e.g.:
decodeSegmentnow re-encodes URL-unsafe characters in the WHATWG URL "path percent-encode set" (<,>,",`,{,}) and ASCII control characters after decoding, keeping them in percent-encoded form. This prevents mismatches...
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.changeset/fix-decode-path-preserve-unsafe.md around lines 5 - 7, The
changeset summary for `decodeSegment` is inaccurate and should be revised to
match the implementation. Update the description to say that `decodeSegment`
still primarily uses `decodeURI` with a per-character fallback, and that the
actual fix is re-encoding URL-unsafe characters via `sanitizePathSegment`
instead of stripping control characters. Keep the rest of the explanation
aligned with the infinite redirect loop issue and preserve the references to
`decodeSegment` and `sanitizePathSegment`.
| * Space (0x20) is intentionally excluded — decodeURI decodes %20 to space | ||
| * and the router stores decoded spaces in location.pathname. The existing | ||
| * encodePathLikeUrl already handles re-encoding spaces for outgoing URLs. | ||
| * | ||
| * These characters are decoded by decodeURI but must remain percent-encoded | ||
| * in paths to match how upstream layers (CDNs, edge middleware, browsers) | ||
| * interpret the URL, preventing infinite redirect loops and path mismatches. | ||
| */ | ||
| // eslint-disable-next-line no-control-regex | ||
| const PATH_UNSAFE_RE = /[\x00-\x1f\x7f"<>`{}]/g | ||
|
|
||
| function sanitizePathSegment(segment: string): string { | ||
| // Remove ASCII control characters (0x00-0x1F) and DEL (0x7F) | ||
| // These include CR (\r = 0x0D), LF (\n = 0x0A), and other potentially dangerous characters | ||
| // eslint-disable-next-line no-control-regex | ||
| return segment.replace(/[\x00-\x1f\x7f]/g, '') | ||
| return segment.replace(PATH_UNSAFE_RE, (ch) => | ||
| '%' + ch.charCodeAt(0).toString(16).toUpperCase().padStart(2, '0'), |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Preserve %20 in the internal path too.
Line 530 intentionally keeps %20 decoded to a literal space, but the stated contract for this fix is to keep encoded path-unsafe bytes aligned with the raw request URL. That means paths like /file%20name can still diverge during SSR/router comparisons, which is the same mismatch class this patch is trying to remove. Please include space in the re-encode set and update the unit expectation at Line 630 with it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/router-core/src/utils.ts` around lines 530 - 543, The path
sanitization in utils.ts still leaves spaces decoded, so internal router paths
can diverge from the raw request URL and cause SSR/router mismatches. Update
sanitizePathSegment and the PATH_UNSAFE_RE contract to also re-encode space
characters (along with the existing unsafe bytes), then adjust the related unit
expectation in the path handling tests so "%20" is preserved consistently.
| function sanitizePathSegment(segment: string): string { | ||
| // Remove ASCII control characters (0x00-0x1F) and DEL (0x7F) | ||
| // These include CR (\r = 0x0D), LF (\n = 0x0A), and other potentially dangerous characters | ||
| // eslint-disable-next-line no-control-regex | ||
| return segment.replace(/[\x00-\x1f\x7f]/g, '') | ||
| return segment.replace(PATH_UNSAFE_RE, (ch) => | ||
| '%' + ch.charCodeAt(0).toString(16).toUpperCase().padStart(2, '0'), | ||
| ) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
Mixed malformed + valid UTF-8 sequences are still broken.
This re-encode step only helps after decodeURI(segment) succeeds. If one malformed escape makes that throw, the fallback still decodes %XX byte-by-byte, so a later valid multibyte run in the same segment stays incorrectly encoded instead of being decoded and preserved. That misses the “decode contiguous runs and continue” requirement from the review thread and can still change route matching/param extraction on mixed-validity paths.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/router-core/src/utils.ts` around lines 541 - 544, The fallback in
sanitizePathSegment still fails on mixed malformed and valid UTF-8 because it
decodes %XX byte-by-byte after decodeURI throws, leaving later multibyte runs
incorrectly encoded. Update the decoding logic in sanitizePathSegment (and its
PATH_UNSAFE_RE-based fallback path) to process contiguous valid percent-encoded
runs as a unit, preserve valid decoded bytes, and continue past malformed bytes
instead of falling back to per-byte behavior.
What
decodeSegment(called bydecodePath) previously useddecodeURI()which decoded all percent-encoded characters — including those that are unsafe in URL paths per the WHATWG spec. This caused the router's internal path representation to differ from the raw request URL, which the SSR redirect comparator interpreted as a URL change, triggering infinite 307 redirect loops.This PR replaces the
decodeURI()-based approach with per-character decoding that preserves:",<,>,`,{,}Reproduction
Any TanStack Start app with a path param route will infinite-loop on URLs containing encoded curly braces, angle brackets, etc:
Why this approach
The previous implementation decoded everything in
decodeSegmentand then tried to fix problems after the fact (sanitizePathSegmentstripped control chars,encodePathLikeUrlwas supposed to re-encode). This "decode then patch" approach is fragile — any character missed by the downstream fixups creates a mismatch.The cleaner fix is to not decode these characters in the first place. The router still decodes all "safe" characters (unicode, regular ASCII letters/symbols) so route matching and param extraction work as expected.
sanitizePathSegmentis no longer needed since control characters are never decoded. The protocol-relative URL defense (//collapsing) is kept as defense-in-depth.Fixes #7587.
Summary by CodeRabbit
Bug Fixes
{},<>, quotes, and control characters stay safely encoded./%7B%7Btemplate%7D%7D.%0dand similar inputs.Tests