Skip to content

Commit aadef0c

Browse files
committed
File tidies for 10.43-RC1 release
1 parent 2bba84b commit aadef0c

18 files changed

+450
-379
lines changed

AUTHORS

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Email domain: gmail.com
88
Retired from University of Cambridge Computing Service,
99
Cambridge, England.
1010

11-
Copyright (c) 1997-2022 University of Cambridge
11+
Copyright (c) 1997-2023 University of Cambridge
1212
All rights reserved
1313

1414

@@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
1919
Email local part: hzmester
2020
Emain domain: freemail.hu
2121

22-
Copyright(c) 2010-2022 Zoltan Herczeg
22+
Copyright(c) 2010-2023 Zoltan Herczeg
2323
All rights reserved.
2424

2525

@@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
3030
Email local part: hzmester
3131
Emain domain: freemail.hu
3232

33-
Copyright(c) 2009-2022 Zoltan Herczeg
33+
Copyright(c) 2009-2023 Zoltan Herczeg
3434
All rights reserved.
3535

3636
####

ChangeLog

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ Before the move to GitHub, this was the only record of changes to PCRE2. Now
55
there is often more detail in the pull requests.
66

77

8-
Version 10.43 xx-xxx-202x
9-
-------------------------
8+
Version 10.43 27-December-2023
9+
------------------------------
1010

1111
1. The test program added by change 2 of 10.42 didn't work when the default
1212
newline setting didn't include \n as a newline. One test needed (*LF) to ensure

LICENCE

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Email domain: gmail.com
2626
Retired from University of Cambridge Computing Service,
2727
Cambridge, England.
2828

29-
Copyright (c) 1997-2022 University of Cambridge
29+
Copyright (c) 1997-2023 University of Cambridge
3030
All rights reserved.
3131

3232

@@ -37,7 +37,7 @@ Written by: Zoltan Herczeg
3737
Email local part: hzmester
3838
Email domain: freemail.hu
3939

40-
Copyright(c) 2010-2022 Zoltan Herczeg
40+
Copyright(c) 2010-2023 Zoltan Herczeg
4141
All rights reserved.
4242

4343

@@ -48,7 +48,7 @@ Written by: Zoltan Herczeg
4848
Email local part: hzmester
4949
Email domain: freemail.hu
5050

51-
Copyright(c) 2009-2022 Zoltan Herczeg
51+
Copyright(c) 2009-2023 Zoltan Herczeg
5252
All rights reserved.
5353

5454

NEWS

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,52 @@ News about PCRE2 releases
22
-------------------------
33

44

5+
Version 10.43 27-December-2023
6+
------------------------------
7+
8+
There are quite a lot of changes in this release (see ChangeLog and git log for
9+
a list). Those that are not bugfixes or code tidies are:
10+
11+
* A new function pcre2_get_match_data_heapframes_size() for finer heap control.
12+
13+
* New option flags to restrict the interaction between ASCII and non-ASCII
14+
characters for caseless matching and \d and friends. There are also new
15+
pattern constructs to control these flags from within a pattern.
16+
17+
* Upgrade to Unicode 15.0.0.
18+
19+
* Treat a NULL pattern with zero length as an empty string.
20+
21+
* Added support for limited-length variable-length lookbehind assertions, with
22+
a default maximum length of 255 characters (same as Perl) but with a function
23+
to adjust the limit.
24+
25+
* Support for LoongArch to JIT.
26+
27+
* Perl changed the meaning of (for examle) {,3} which did not used to be
28+
recognized as a quantifier. Now it means {0,3} and PCRE2 has also changed.
29+
Note that {,} is still not a quantifier.
30+
31+
* Following Perl, allow spaces and tabs after { and before } in all Perl-
32+
compatible items that use braces, and also around commas in quantifiers. The
33+
one exception in PCRE2 is \u{...}, which is from ECMAScript, not Perl, and
34+
PCRE2 follows ECMAScript usage.
35+
36+
* Changed the meaning of \w and its synonyms and derivatives (\b and \B) in UCP
37+
mode to follow Perl. It now matches characters whose general categories are L
38+
or N or whose particular categories are Mn (non-spacing mark) or Pc
39+
(combining puntuation).
40+
41+
* Changed the default meaning of [:xdigit:] in UCP mode to follow Perl. It now
42+
matches the "fullwidth" versions of hex digits. PCRE2_EXTRA_ASCII_DIGIT can
43+
be used to keep it ASCII only.
44+
45+
* Make PCRE2_UCP the default in UTF mode in pcre2grep and add -no_ucp,
46+
--case-restrict and --posix-digit.
47+
48+
* Add --group-separator and --no-group-separator to pcre2grep.
49+
50+
551
Version 10.42 11-December-2022
652
------------------------------
753

configure.ac

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
1010

1111
m4_define(pcre2_major, [10])
1212
m4_define(pcre2_minor, [43])
13-
m4_define(pcre2_prerelease, [-DEV])
14-
m4_define(pcre2_date, [2023-04-14])
13+
m4_define(pcre2_prerelease, [-RC1])
14+
m4_define(pcre2_date, [2023-12-27])
1515

1616
# Libtool shared library interface versions (current:revision:age)
17-
m4_define(libpcre2_8_version, [11:2:11])
18-
m4_define(libpcre2_16_version, [11:2:11])
19-
m4_define(libpcre2_32_version, [11:2:11])
20-
m4_define(libpcre2_posix_version, [3:4:0])
17+
m4_define(libpcre2_8_version, [12:0:12])
18+
m4_define(libpcre2_16_version, [12:0:12])
19+
m4_define(libpcre2_32_version, [12:0:12])
20+
m4_define(libpcre2_posix_version, [3:5:0])
2121

2222
# NOTE: The CMakeLists.txt file searches for the above variables in the first
2323
# 50 lines of this file. Please update that if the variables above are moved.

doc/html/pcre2grep.html

Lines changed: 29 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -71,15 +71,16 @@ <h1>pcre2grep man page</h1>
7171
<pre>
7272
pcre2grep some-pattern file1 - file3
7373
</pre>
74-
By default, input files are searched line by line. Each line that matches a
75-
pattern is copied to the standard output, and if there is more than one file,
76-
the file name is output at the start of each line, followed by a colon.
77-
However, there are options that can change how <b>pcre2grep</b> behaves. For
78-
example, the <b>-M</b> option makes it possible to search for strings that span
79-
line boundaries. What defines a line boundary is controlled by the <b>-N</b>
80-
(<b>--newline</b>) option. The <b>-h</b> and <b>-H</b> options control whether or
81-
not file names are shown, and the <b>-Z</b> option changes the file name
82-
terminator to a zero byte.
74+
By default, input files are searched line by line, so pattern assertions about
75+
the beginning and end of a subject string (^, $, \A, \Z, and \z) match at
76+
the beginning and end of each line. When a line matches a pattern, it is copied
77+
to the standard output, and if there is more than one file, the file name is
78+
output at the start of each line, followed by a colon. However, there are
79+
options that can change how <b>pcre2grep</b> behaves. For example, the <b>-M</b>
80+
option makes it possible to search for strings that span line boundaries. What
81+
defines a line boundary is controlled by the <b>-N</b> (<b>--newline</b>) option.
82+
The <b>-h</b> and <b>-H</b> options control whether or not file names are shown,
83+
and the <b>-Z</b> option changes the file name terminator to a zero byte.
8384
</P>
8485
<P>
8586
The amount of memory used for buffering files that are being scanned is
@@ -563,16 +564,24 @@ <h1>pcre2grep man page</h1>
563564
<P>
564565
<b>-M</b>, <b>--multiline</b>
565566
Allow patterns to match more than one line. When this option is set, the PCRE2
566-
library is called in "multiline" mode. This allows a matched string to extend
567-
past the end of a line and continue on one or more subsequent lines. Patterns
568-
used with <b>-M</b> may usefully contain literal newline characters and internal
569-
occurrences of ^ and $ characters. The output for a successful match may
570-
consist of more than one line. The first line is the line in which the match
571-
started, and the last line is the line in which the match ended. If the matched
572-
string ends with a newline sequence, the output ends at the end of that line.
573-
If <b>-v</b> is set, none of the lines in a multi-line match are output. Once a
574-
match has been handled, scanning restarts at the beginning of the line after
575-
the one in which the match ended.
567+
library is called in "multiline" mode, and a match is allowed to continue past
568+
the end of the initial line and onto one or more subsequent lines.
569+
<br>
570+
<br>
571+
Patterns used with <b>-M</b> may usefully contain literal newline characters and
572+
internal occurrences of ^ and $ characters, because in multiline mode these can
573+
match at internal newlines. Because <b>pcre2grep</b> is scanning multiple lines,
574+
the \Z and \z assertions match only at the end of the last line in the file.
575+
The \A assertion matches at the start of the first line of a match. This can
576+
be any line in the file; it is not anchored to the first line.
577+
<br>
578+
<br>
579+
The output for a successful match may consist of more than one line. The first
580+
line is the line in which the match started, and the last line is the line in
581+
which the match ended. If the matched string ends with a newline sequence, the
582+
output ends at the end of that line. If <b>-v</b> is set, none of the lines in a
583+
multi-line match are output. Once a match has been handled, scanning restarts
584+
at the beginning of the line after the one in which the match ended.
576585
<br>
577586
<br>
578587
The newline sequence that separates multiple lines must be matched as part of
@@ -1107,7 +1116,7 @@ <h1>pcre2grep man page</h1>
11071116
</P>
11081117
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
11091118
<P>
1110-
Last updated: 20 November 2023
1119+
Last updated: 22 December 2023
11111120
<br>
11121121
Copyright &copy; 1997-2023 University of Cambridge.
11131122
<br>

doc/html/pcre2pattern.html

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -328,10 +328,10 @@ <h1>pcre2pattern man page</h1>
328328
Brace characters { and } are also used to enclose data for constructions such
329329
as \g{2} or \k{name}. In almost all uses of braces, space and/or horizontal
330330
tab characters that follow { or precede } are allowed and are ignored. In the
331-
case of quantifiers, they may also appear before or after the comma. The
331+
case of quantifiers, they may also appear before or after the comma. The
332332
exception to this is \u{...} which is an ECMAScript compatibility feature
333-
that is recognized only when the PCRE2_EXTRA_ALT_BSUX option is set. ECMAScript
334-
does not ignore such white space; it causes the item to be interpreted as
333+
that is recognized only when the PCRE2_EXTRA_ALT_BSUX option is set. ECMAScript
334+
does not ignore such white space; it causes the item to be interpreted as
335335
literal.
336336
</P>
337337
<P>
@@ -472,7 +472,7 @@ <h1>pcre2pattern man page</h1>
472472
(carriage return) character.
473473
</P>
474474
<P>
475-
An error occurs if \c is not followed by a character whose ASCII code point
475+
An error occurs if \c is not followed by a character whose ASCII code point
476476
is in the range 32 to 126. The precise effect of \cx is as follows: if x is a
477477
lower case letter, it is converted to upper case. Then bit 6 of the character
478478
(hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is
@@ -694,8 +694,8 @@ <h1>pcre2pattern man page</h1>
694694
\s any character that matches \p{Z} or \h or \v
695695
\w any character that matches \p{L}, \p{N}, \p{Mn}, or \p{Pc}
696696
</pre>
697-
The addition of \p{Mn} (non-spacing mark) and the replacement of an explicit
698-
test for underscore with a test for \p{Pc} (connector punctuation) happened in
697+
The addition of \p{Mn} (non-spacing mark) and the replacement of an explicit
698+
test for underscore with a test for \p{Pc} (connector punctuation) happened in
699699
PCRE2 release 10.43. This brings PCRE2 into line with Perl.
700700
</P>
701701
<P>
@@ -1074,7 +1074,7 @@ <h1>pcre2pattern man page</h1>
10741074
carriage return, and any other character that has the Z (separator) property.
10751075
Xsp is the same as Xps; in PCRE1 it used to exclude vertical tab, for Perl
10761076
compatibility, but Perl changed. Xwd matches the same characters as Xan, plus
1077-
those that match Mn (non-spacing mark) or Pc (connector punctuation, which
1077+
those that match Mn (non-spacing mark) or Pc (connector punctuation, which
10781078
includes underscore).
10791079
</P>
10801080
<P>
@@ -1586,7 +1586,7 @@ <h1>pcre2pattern man page</h1>
15861586
</P>
15871587
<P>
15881588
The other POSIX classes are unchanged by PCRE2_UCP, and match only characters
1589-
with code points less than 256.
1589+
with code points less than 256.
15901590
</P>
15911591
<P>
15921592
There are two options that can be used to restrict the POSIX classes to ASCII
@@ -1613,8 +1613,8 @@ <h1>pcre2pattern man page</h1>
16131613
<a href="#smallassertions">"Simple assertions"</a>
16141614
above), and in a Perl-style pattern the preceding or following character
16151615
normally shows which is wanted, without the need for the assertions that are
1616-
used above in order to give exactly the POSIX behaviour. Note also that the
1617-
PCRE2_UCP option changes the meaning of \w (and therefore \b) by default, so
1616+
used above in order to give exactly the POSIX behaviour. Note also that the
1617+
PCRE2_UCP option changes the meaning of \w (and therefore \b) by default, so
16181618
it also affects these POSIX sequences.
16191619
</P>
16201620
<br><a name="SEC12" href="#TOC1">VERTICAL BAR</a><br>
@@ -1682,8 +1682,8 @@ <h1>pcre2pattern man page</h1>
16821682
above, it sets (or unsets) all the ASCII options.
16831683
</P>
16841684
<P>
1685-
PCRE2_EXTRA_ASCII_DIGIT has no additional effect when PCRE2_EXTRA_ASCII_POSIX
1686-
is set, but including it in (?aP) means that (?-aP) suppresses all ASCII
1685+
PCRE2_EXTRA_ASCII_DIGIT has no additional effect when PCRE2_EXTRA_ASCII_POSIX
1686+
is set, but including it in (?aP) means that (?-aP) suppresses all ASCII
16871687
restrictions for POSIX classes.
16881688
</P>
16891689
<P>
@@ -1993,7 +1993,7 @@ <h1>pcre2pattern man page</h1>
19931993
X{,4} is interpreted as X{0,4}
19941994
</pre>
19951995
This is a change in behaviour that happened in Perl 5.34.0 and PCRE2 10.43. In
1996-
earlier versions such a sequence was not interpreted as a quantifier. Other
1996+
earlier versions such a sequence was not interpreted as a quantifier. Other
19971997
regular expression engines may behave either way.
19981998
</P>
19991999
<P>
@@ -2287,7 +2287,7 @@ <h1>pcre2pattern man page</h1>
22872287
The sequence \g{-1} is a reference to the capture group whose number is one
22882288
less than the number of the next group to be started, so in this example (where
22892289
the next group would be numbered 3) is it equivalent to \2, and \g{-2} would
2290-
be equivalent to \1. Note that if this construct is inside a capture group,
2290+
be equivalent to \1. Note that if this construct is inside a capture group,
22912291
that group is included in the count, so in this example \g{-2} also refers to
22922292
group 1:
22932293
<pre>
@@ -2323,8 +2323,8 @@ <h1>pcre2pattern man page</h1>
23232323
</P>
23242324
<P>
23252325
There are several different ways of writing backreferences to named capture
2326-
groups. The .NET syntax is \k{name}, the Python syntax is (?=name), and the
2327-
original Perl syntax is \k&#60;name&#62; or \k'name'. All of these are now supported
2326+
groups. The .NET syntax is \k{name}, the Python syntax is (?=name), and the
2327+
original Perl syntax is \k&#60;name&#62; or \k'name'. All of these are now supported
23282328
by both Perl and PCRE2. Perl 5.10's unified backreference syntax, in which \g
23292329
can be used for both numeric and named references, is also supported by PCRE2.
23302330
We could rewrite the above example in any of the following ways:
@@ -2778,7 +2778,7 @@ <h1>pcre2pattern man page</h1>
27782778
condition is true if a capture group of that number has previously matched. If
27792779
there is more than one capture group with the same number (see the earlier
27802780
<a href="#recursion">section about duplicate group numbers),</a>
2781-
the condition is true if any of them have matched. An alternative notation,
2781+
the condition is true if any of them have matched. An alternative notation,
27822782
which is a PCRE2 extension, not supported by Perl, is to precede the digits
27832783
with a plus or minus sign. In this case, the group number is relative rather
27842784
than absolute. The most recently opened capture group (which could be enclosing

doc/html/pcre2syntax.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -408,8 +408,8 @@ <h1>pcre2syntax man page</h1>
408408
(?-...) unset the given option(s)
409409
(?^) unset imnrsx options
410410
</pre>
411-
(?aP) implies (?aT) as well, though this has no additional effect. However, it
412-
means that (?-aP) is really (?-PT) which disables all ASCII restrictions for
411+
(?aP) implies (?aT) as well, though this has no additional effect. However, it
412+
means that (?-aP) is really (?-PT) which disables all ASCII restrictions for
413413
POSIX classes.
414414
</P>
415415
<P>

0 commit comments

Comments
 (0)