Skip to content

Test: code point escape#145

Open
jitsedesmet wants to merge 6 commits into
mainfrom
tests/codepoint-escape
Open

Test: code point escape#145
jitsedesmet wants to merge 6 commits into
mainfrom
tests/codepoint-escape

Conversation

@jitsedesmet
Copy link
Copy Markdown
Member

@jitsedesmet jitsedesmet commented Jun 4, 2026

Add tests added in: w3c/rdf-tests#346

Following updated spec:
w3c/sparql-query#383
and
w3c/sparql-query#384

jitsedesmet and others added 5 commits June 2, 2026 09:30
- Use codePointAt() instead of charCodeAt() per unicorn/prefer-code-point
- Move inline comments to separate lines per line-comment-position rule
- Shorten long test line per max-len rule
- Add unterminated short-string test to maintain 100% branch coverage

Note: The 5 failing W3C live tests (codepoint-esc-01/02/06/07/08) are
old positive tests that PR #346 explicitly removes from the manifest.
Implementing PR #346's restriction on codepoint escape placement is
inherently incompatible with those old tests; they will be dropped when
the PR is merged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per W3C PR #383/#384, UCHAR escapes (\uXXXX / \UXXXXXXXX) are no
longer processed by a global query pre-processor. Instead they are
handled at the grammar level, inside string literal and IRI reference
tokens only.

Changes:
- Add UCHAR-aware lexer tokens (iriRef, stringLiteral1/2, long1/2)
  to sparql12LexerBuilder that replace the 1.1 variants
- Add codepointEscape() to SparqlContext; default implementation
  (sparql12CodepointEscape) rejects all surrogate code points
  (U+D800–U+DFFF), including surrogate pairs
- Override string grammar rule with two-pass decode: UCHAR first, then
  ECHAR; this correctly rejects \\u0041 (two backslashes → \A →
  invalid ECHAR) as required by codepoint-esc-bad-03
- Override iriFull grammar rule to apply codepointEscape to IRI content
- Remove queryPreProcessor from Parser.ts; patch string and iriFull rules
- Add lexResult.errors check in parserBuilder so queries with bare
  backslashes outside strings/IRIs throw rather than silently recover
- Fix comment token pattern (no required trailing newline) so queries
  ending with a comment but no newline are accepted
- Skip 5 dawgt:Proposed W3C tests that encode pre-PR-#383 behaviour and
  contradict the new grammar-level restriction
- Regenerate source-tracked AST snapshots for codepoint-esc-05/06/07
  (UCHAR sequences now remain unexpanded in lexed tokens)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add two negative test cases that exercise the error paths introduced by
the two-pass UCHAR+ECHAR string decode in the SPARQL 1.2 grammar:

- codepoint-esc-05-bad: \u005C decodes to \ (backslash), leaving a
  trailing unpaired \ at the end of the string literal → error
- codepoint-esc-06-bad: \u005Cx decodes to \x, where x is not a
  valid ECHAR character → error

Both paths were previously uncovered; coverage is back to 100%.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jitsedesmet jitsedesmet marked this pull request as ready for review June 5, 2026 09:43
@jitsedesmet
Copy link
Copy Markdown
Member Author

@rubensworks Would it make sense to release this as an new minor version? Since it might 'break' some systems on SPARQL 1.2 (in that sense that the escaping has changed). I am hesitant to make it a major version because our API in a way did not break. We promise to implement a SPARQL 1.2 parser, but that parser is not 'stable' since the spec is not final...

@rubensworks
Copy link
Copy Markdown
Member

I wouldn't worry too much about it. I'd just patch it. It's a relatively small change in any case. And only for 1.2. (If it would change something in 1.1, that would be something else IMO)

I just skimmed the diff, and some tests are skipped in the package.json. Intentional?

@jitsedesmet
Copy link
Copy Markdown
Member Author

That should be removed, it was needed before the spec test pr was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants