@@ -1697,12 +1697,21 @@ <h1>pcre2api man page</h1>
16971697changed within a pattern by a (?i) option setting. If either PCRE2_UTF or
16981698PCRE2_UCP is set, Unicode properties are used for all characters with more than
16991699one other case, and for all characters whose code points are greater than
1700- U+007F. Note that there are two ASCII characters, K and S, that, in addition to
1700+ U+007F.
1701+ </ P >
1702+ < P >
1703+ Note that there are two ASCII characters, K and S, that, in addition to
17011704their lower case ASCII equivalents, are case-equivalent with U+212A (Kelvin
17021705sign) and U+017F (long S) respectively. If you do not want this case
17031706equivalence, you can suppress it by setting PCRE2_EXTRA_CASELESS_RESTRICT.
17041707</ P >
17051708< P >
1709+ One language family, Turkish and Azeri, has its own case-insensitivity rules,
1710+ which can be selected by setting PCRE2_EXTRA_TURKISH_CASING. This alters the
1711+ behaviour of the 'i', 'I', U+0130 (capital I with dot above), and U+0131
1712+ (small dotless i) characters.
1713+ </ P >
1714+ < P >
17061715For lower valued characters with only one other case, a lookup table is used
17071716for speed. When neither PCRE2_UTF nor PCRE2_UCP is set, a lookup table is used
17081717for all code points less than 256, and higher code points (available only in
@@ -2037,9 +2046,16 @@ <h1>pcre2api man page</h1>
20372046upper/lower casing operations, even when PCRE2_UTF is not set. This makes it
20382047possible to process strings in the 16-bit UCS-2 code. This option is available
20392048only if PCRE2 has been compiled with Unicode support (which is the default).
2040- The PCRE2_EXTRA_CASELESS_RESTRICT option (see below) restricts caseless
2049+ </ P >
2050+ < P >
2051+ The PCRE2_EXTRA_CASELESS_RESTRICT option (see above) restricts caseless
20412052matching such that ASCII characters match only ASCII characters and non-ASCII
2042- characters match only non-ASCII characters.
2053+ characters match only non-ASCII characters. The PCRE2_EXTRA_TURKISH_CASING option
2054+ (see above) alters the matching of the 'i' characters to follow their behaviour
2055+ in Turkish and Azeri languages. For further details on
2056+ PCRE2_EXTRA_CASELESS_RESTRICT and PCRE2_EXTRA_TURKISH_CASING, see the
2057+ < a href ="pcre2unicode.html "> < b > pcre2unicode</ b > </ a >
2058+ page.
20432059< pre >
20442060 PCRE2_UNGREEDY
20452061</ pre >
@@ -2176,7 +2192,8 @@ <h1>pcre2api man page</h1>
21762192ASCII letter K is case-equivalent to U+212a (Kelvin sign). This option disables
21772193recognition of case-equivalences that cross the ASCII/non-ASCII boundary. In a
21782194caseless match, both characters must either be ASCII or non-ASCII. The option
2179- can be changed with a pattern by the (?r) option setting.
2195+ can be changed within a pattern by the (*CASELESS_RESTRICT) or (?r) option
2196+ settings.
21802197< pre >
21812198 PCRE2_EXTRA_ESCAPED_CR_IS_LF
21822199</ pre >
@@ -2223,6 +2240,14 @@ <h1>pcre2api man page</h1>
22232240returning PCRE2_ERROR_CALLOUT_CALLER_DISABLED. This is useful if the application
22242241knows that a callout will not be provided to < b > pcre2_match()</ b > , so that
22252242callouts in the pattern are not silently ignored.
2243+ < pre >
2244+ PCRE2_EXTRA_TURKISH_CASING
2245+ </ pre >
2246+ This option alters case-equivalence of the 'i' letters to follow the
2247+ alphabet used by Turkish and Azeri languages. The option can be changed within
2248+ a pattern by the (*TURKISH_CASING) start-of-pattern setting. Either the UTF or
2249+ UCP options must be set. In the 8-bit library, UTF must be set. This option
2250+ cannot be combined with PCRE2_EXTRA_CASELESS_RESTRICT.
22262251< a name ="jitcompiling "> </ a > </ P >
22272252< br > < a name ="SEC21 " href ="#TOC1 "> JUST-IN-TIME (JIT) COMPILATION</ a > < br >
22282253< P >
0 commit comments