test: add unicode pattern and patternProperties tests for draft2020-12 by Shristibot · Pull Request #837 · json-schema-org/JSON-Schema-Test-Suite

Shristibot · 2026-02-11T09:14:11Z

Adds minimal Unicode literal tests for pattern and patternProperties in draft‑2020‑12.
These tests use a simple non‑ASCII character (π) to confirm literal Unicode matching in the Unicode‑aware regex mode required by draft‑2020‑12, avoiding advanced regex features and focusing on basic literal behavior.

jdesrosiers

The pattern used here works the same with or without unicode mode enabled. You need to find something where /some-pattern/ works differently than /some-pattern/u. (Where the latter produces the correct behavior.

karenetheridge · 2026-02-17T19:34:24Z

What are you trying to test for that we don't already have coverage for? I see tests for pattern ^\d+$ against the string "৪২" (BENGALI DIGIT FOUR, BENGALI DIGIT TWO) in tests/*/optional/ecmascript-regex.json .

jdesrosiers · 2026-02-17T20:03:39Z

What are you trying to test for that we don't already have coverage for?

In 2020-12, we added the requirement that implementations should use regex in unicode mode. We currently don't have any required tests that cover that requirement.

I see tests for pattern ^\d+$ against the string "৪২" (BENGALI DIGIT FOUR, BENGALI DIGIT TWO) in tests/*/optional/ecmascript-regex.json .

Ah, good catch. Maybe we don't need new tests. Maybe we just need to move some optional tests to required?

karenetheridge · 2026-02-18T04:19:26Z

I see tests for pattern ^\d+$ against the string "৪২" (BENGALI DIGIT FOUR, BENGALI DIGIT TWO) in tests/*/optional/ecmascript-regex.json .

Ah, good catch. Maybe we don't need new tests. Maybe we just need to move some optional tests to required?

These tests test the inverse -- that \d only matches the ascii digits, not other digits in unicode.

In 2020-12, we added the requirement that implementations should use regex in unicode mode. We currently don't have any required tests that cover that requirement.

Where is that coming from? draft2020-12 says we use ECMA-262's regular expression semantics, and at https://tc39.es/ecma262/#sec-compiletocharset it seems to be saying the opposite, that \d only matches ascii [0-9].

I'd love it to be the other way though -- I use unicode semantics in my implementation by default and these optional tests are marked as "expected failure", but I think we'd have to make that change in the next version.

jdesrosiers · 2026-02-19T19:37:55Z

Ha, yeah, sorry, I've never looked closely at the optional tests, so I'm not really familiar with them.

Regular expressions SHOULD be built with the "u" flag (or equivalent) to provide Unicode support, or processed in such a way which provides Unicode support as defined by ECMA-262.

https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-01#section-6.4

That's the part of the spec this PR should be testing. If there are optional tests that cover that, let's move them to required. If not, let's add new ones. If there are optional tests that need to be removed, let's do that too. I'm generally in favor of deleting the whole optional directory, so I have no problem with removing anything there that we don't think makes sense anymore.

karenetheridge · 2026-02-19T19:41:40Z

Regular expressions SHOULD be built with the "u" flag (or equivalent)

Ah super, so the current optional tests are wrong then :) We should definitely fix that!

karenetheridge · 2026-02-19T20:12:24Z

So, there's some interesting history here: #505 and #498. I've read these both again thoroughly and I don't see how #505 can be correct given what the spec says. The clear intent is to say: regexes should use ECMA-262 semantics, with the unicode flag. So all the tests that assert that non-ascii characters shouldn't match \w, or non-ascii digits should not match \d (edit: not \s), are wrong.

Also, there is a test that asserts that 0xFEFF should match \s, which is also wrong, because despite the character being named "ZERO WIDTH NO-BREAK SPACE", it's not actually in the Space character class. This can be verified by going directly to the Unicode properties database, and I found several threads in different languages and forums (e.g. golang, Stack Overflow) discussing this peculiarity as well.

@Shristibot since the scale of this issue has just grown a lot more from the original, feel free to punt it and it can be reassigned.

Shristibot · 2026-02-21T07:33:07Z

Thanks for all the extra context and the pointers to the spec and existing regex tests.

It sounds like this PR has uncovered a larger cleanup around Unicode regex semantics and the optional/ecmascript-regex tests. I’d like to keep working on this rather than dropping it, but I’m not sure what the best next step is.

Would you prefer that I:
•Narrow this PR to a minimal change (for example, just adding/moving clearly correct Unicode-mode tests into the required draft-2020-12 tests), and leave the broader cleanup for follow-up PRs, or
•Expand this PR to also update/remove the incorrect optional tests now?

I can start reviewing the optional tests against the draft-2020-12 requirements as well—happy to adjust the scope based on what would be most useful.

jdesrosiers · 2026-02-23T03:13:22Z

Would you prefer that I:
•Narrow this PR to a minimal change (for example, just adding/moving clearly correct Unicode-mode tests into the required draft-2020-12 tests), and leave the broader cleanup for follow-up PRs, or
•Expand this PR to also update/remove the incorrect optional tests now?

I'm fine with whatever you want to do. What's important for this PR is to test that we are testing that regex evaluation uses the "u" flag. If you want to ignore optional tests and leave that to be cleaned up in another issue, I'm ok with that.

Add tests using \p{Letter} pattern which requires the ECMA-262 u flag: - pattern.json: verify Unicode-mode regex with ASCII letters, non-ASCII letters, and digits - patternProperties.json: verify Unicode-mode pattern matching for property names

jdesrosiers

We're on the right track now.

Please add these tests to the v1 tests as well.

Please revert the whitespace changes in patternProperties.json. The only changes should be the test that you added.

tests/draft2020-12/pattern.json

tests/draft2020-12/patternProperties.json

Shristibot · 2026-03-06T08:35:11Z

We're on the right track now.

Please add these tests to the v1 tests as well.

Please revert the whitespace changes in patternProperties.json. The only changes should be the test that you added.
Hi @jdesrosiers ,
I have added tests to v1.

jdesrosiers · 2026-03-06T19:44:48Z

@Shristibot There's one more thing remaining from my last review.

Please revert the whitespace changes in patternProperties.json. The only changes should be the test that you added.

Please do that. Then we're ready to merge.

test: add unicode pattern and patternProperties tests for draft2020-12

86f4229

Shristibot requested a review from a team as a code owner February 11, 2026 09:14

Remove unintended snyk instructions file

42987fa

jdesrosiers requested changes Feb 17, 2026

View reviewed changes

Shristibot mentioned this pull request Feb 22, 2026

Add Unicode-focused tests for string length and pattern handling #829

Open

abhi-03-kh mentioned this pull request Feb 23, 2026

test(draft2020-12): add literal Unicode coverage for pattern and patternProperties #852

Closed

jdesrosiers requested changes Mar 5, 2026

View reviewed changes

tests/draft2020-12/pattern.json Outdated Show resolved Hide resolved

tests/draft2020-12/pattern.json Outdated Show resolved Hide resolved

tests/draft2020-12/patternProperties.json Outdated Show resolved Hide resolved

tests/draft2020-12/patternProperties.json Outdated Show resolved Hide resolved

kumari-Shristi added 2 commits March 6, 2026 13:01

Added tests for v1

c436a07

Shorten test description to satisfy sanity check

70ca854

kumari-Shristi and others added 2 commits March 7, 2026 11:01

Revert whitespace changes in patternProperties.json

8bda45d

Cleanup whitespace

d4b6601

jdesrosiers approved these changes Mar 9, 2026

View reviewed changes

jdesrosiers merged commit 06481b1 into json-schema-org:main Mar 9, 2026
3 checks passed

Uh oh!

Conversation

Shristibot commented Feb 11, 2026

Uh oh!

jdesrosiers left a comment

Choose a reason for hiding this comment

Uh oh!

karenetheridge commented Feb 17, 2026

Uh oh!

jdesrosiers commented Feb 17, 2026

Uh oh!

karenetheridge commented Feb 18, 2026

Uh oh!

jdesrosiers commented Feb 19, 2026

Uh oh!

karenetheridge commented Feb 19, 2026

Uh oh!

karenetheridge commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shristibot commented Feb 21, 2026

Uh oh!

jdesrosiers commented Feb 23, 2026

Uh oh!

jdesrosiers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Shristibot commented Mar 6, 2026

Uh oh!

jdesrosiers commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

karenetheridge commented Feb 19, 2026 •

edited

Loading