Skip to content

Conversation

@balajirrao
Copy link
Contributor

No description provided.

@balajirrao balajirrao force-pushed the regexp-unicode-caseinsensitive branch from 6a24f28 to 647c882 Compare October 17, 2025 16:06
@rbri
Copy link
Collaborator

rbri commented Nov 21, 2025

@balajirrao any plans for finishing this? Waiting for that to makt the separate engine pr...

@balajirrao
Copy link
Contributor Author

@rbri I thought I was going to finish it and then I hit a wall. I believe that in order to do this in the general case, we'd need icu4j. I'm considering creating a module outside of rhino, say, rhino-icu4j that when included would offer complete Unicode support in regexps and possibly in other cases too. How does that sound ?

@andreabergia
Copy link
Contributor

andreabergia commented Nov 28, 2025

@rbri I thought I was going to finish it and then I hit a wall. I believe that in order to do this in the general case, we'd need icu4j. I'm considering creating a module outside of rhino, say, rhino-icu4j that when included would offer complete Unicode support in regexps and possibly in other cases too. How does that sound ?

IMHO that's the right approach. An opt-in module that, if present, adds the capability. If not, we can error out with "not supported". It would be a good improvement on what we do now.

@aardvark179
Copy link
Contributor

I'm not sure the complement classes present an insurmountable wall. icu4j would certainly offer a route to a complete implementation, but it would also be entirely reasonable to calculate classes, and their complements, when needed. Looping from 0 to MAX_CODE_POINT and building a range structure doesn't actually take much time, and most unicode classes have ranges that can be represented pretty compactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants