Regexp: support for case-insensitive unicode matching #2130

balajirrao · 2025-10-17T16:06:00Z

No description provided.

For case-insensitive matching of Unicode surrogate pairs

… matchers

rbri · 2025-11-21T06:30:48Z

@balajirrao any plans for finishing this? Waiting for that to makt the separate engine pr...

balajirrao · 2025-11-21T09:13:47Z

@rbri I thought I was going to finish it and then I hit a wall. I believe that in order to do this in the general case, we'd need icu4j. I'm considering creating a module outside of rhino, say, rhino-icu4j that when included would offer complete Unicode support in regexps and possibly in other cases too. How does that sound ?

andreabergia · 2025-11-28T10:13:36Z

@rbri I thought I was going to finish it and then I hit a wall. I believe that in order to do this in the general case, we'd need icu4j. I'm considering creating a module outside of rhino, say, rhino-icu4j that when included would offer complete Unicode support in regexps and possibly in other cases too. How does that sound ?

IMHO that's the right approach. An opt-in module that, if present, adds the capability. If not, we can error out with "not supported". It would be a good improvement on what we do now.

aardvark179 · 2025-12-01T14:07:59Z

I'm not sure the complement classes present an insurmountable wall. icu4j would certainly offer a route to a complete implementation, but it would also be entirely reasonable to calculate classes, and their complements, when needed. Looping from 0 to MAX_CODE_POINT and building a range structure doesn't actually take much time, and most unicode classes have ranges that can be represented pretty compactly.

balajirrao added 10 commits October 9, 2025 16:50

Allow 'u' and 'i' flags to be used together

7c55078

Add approximate unicode case-folding

3c408de

Change isWord to handle case-insensitive Unicode mode

ff0fcf1

Introduce opcode REOP_UCSPFLAT1i

a72d953

For case-insensitive matching of Unicode surrogate pairs

Case-insensitive matching with anchor

d9157a6

case-insensitive unicode support for flatNIMatcher and flatNIBackward…

ad15376

… matchers

case-insensitive matching support for classes

58d8799

Property escapes

783f568

Backref matcher

f250ed6

Update test262.properties

647c882

balajirrao force-pushed the regexp-unicode-caseinsensitive branch from 6a24f28 to 647c882 Compare October 17, 2025 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regexp: support for case-insensitive unicode matching #2130

Regexp: support for case-insensitive unicode matching #2130

balajirrao commented Oct 17, 2025

Uh oh!

rbri commented Nov 21, 2025

Uh oh!

balajirrao commented Nov 21, 2025

Uh oh!

andreabergia commented Nov 28, 2025 •

edited

Loading

Uh oh!

aardvark179 commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Regexp: support for case-insensitive unicode matching #2130

Are you sure you want to change the base?

Regexp: support for case-insensitive unicode matching #2130

Conversation

balajirrao commented Oct 17, 2025

Uh oh!

rbri commented Nov 21, 2025

Uh oh!

balajirrao commented Nov 21, 2025

Uh oh!

andreabergia commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aardvark179 commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andreabergia commented Nov 28, 2025 •

edited

Loading