test: add unicode pattern and patternProperties tests for draft2020-12#837
Conversation
jdesrosiers
left a comment
There was a problem hiding this comment.
The pattern used here works the same with or without unicode mode enabled. You need to find something where /some-pattern/ works differently than /some-pattern/u. (Where the latter produces the correct behavior.
|
What are you trying to test for that we don't already have coverage for? I see tests for pattern |
In 2020-12, we added the requirement that implementations should use regex in unicode mode. We currently don't have any required tests that cover that requirement.
Ah, good catch. Maybe we don't need new tests. Maybe we just need to move some optional tests to required? |
These tests test the inverse -- that
Where is that coming from? draft2020-12 says we use ECMA-262's regular expression semantics, and at https://tc39.es/ecma262/#sec-compiletocharset it seems to be saying the opposite, that I'd love it to be the other way though -- I use unicode semantics in my implementation by default and these optional tests are marked as "expected failure", but I think we'd have to make that change in the next version. |
|
Ha, yeah, sorry, I've never looked closely at the optional tests, so I'm not really familiar with them.
That's the part of the spec this PR should be testing. If there are optional tests that cover that, let's move them to required. If not, let's add new ones. If there are optional tests that need to be removed, let's do that too. I'm generally in favor of deleting the whole optional directory, so I have no problem with removing anything there that we don't think makes sense anymore. |
Ah super, so the current optional tests are wrong then :) We should definitely fix that! |
|
So, there's some interesting history here: #505 and #498. I've read these both again thoroughly and I don't see how #505 can be correct given what the spec says. The clear intent is to say: regexes should use ECMA-262 semantics, with the unicode flag. So all the tests that assert that non-ascii characters shouldn't match Also, there is a test that asserts that 0xFEFF should match @Shristibot since the scale of this issue has just grown a lot more from the original, feel free to punt it and it can be reassigned. |
|
Thanks for all the extra context and the pointers to the spec and existing regex tests. It sounds like this PR has uncovered a larger cleanup around Unicode regex semantics and the optional/ecmascript-regex tests. I’d like to keep working on this rather than dropping it, but I’m not sure what the best next step is. Would you prefer that I: I can start reviewing the optional tests against the draft-2020-12 requirements as well—happy to adjust the scope based on what would be most useful. |
I'm fine with whatever you want to do. What's important for this PR is to test that we are testing that regex evaluation uses the "u" flag. If you want to ignore optional tests and leave that to be cleaned up in another issue, I'm ok with that. |
Add tests using \p{Letter} pattern which requires the ECMA-262 u flag:
- pattern.json: verify Unicode-mode regex with ASCII letters, non-ASCII letters, and digits
- patternProperties.json: verify Unicode-mode pattern matching for property names
jdesrosiers
left a comment
There was a problem hiding this comment.
We're on the right track now.
Please add these tests to the v1 tests as well.
Please revert the whitespace changes in patternProperties.json. The only changes should be the test that you added.
|
|
@Shristibot There's one more thing remaining from my last review.
Please do that. Then we're ready to merge. |
Adds minimal Unicode literal tests for pattern and patternProperties in draft‑2020‑12.
These tests use a simple non‑ASCII character (π) to confirm literal Unicode matching in the Unicode‑aware regex mode required by draft‑2020‑12, avoiding advanced regex features and focusing on basic literal behavior.