Skip to content

Commit 5a6b32c

Browse files
authored
fixes for 10.47 (#819)
* update required autoconf version Since abc2ae6 (Update AX_PTHREAD (#694), 2025-02-12), when using autoconf 2.60 with autogen, shows: error: possibly undefined macro: AS_ECHO Use 2.62 instead, which include that macro, as the new required version. * jit: enable x86 SIMD helper if SSE2 is available The current implementation needs SSE2 support (which is part of the base amd64 specification and available in most CPUs from this century). Without this change, SIMD will be only enabled if SSE4.1 is detected which is not true for older AMD cpus, even if 64bit capable. While at it, update related documentation and other minor tweaks.
1 parent e89a922 commit 5a6b32c

File tree

9 files changed

+52
-31
lines changed

9 files changed

+52
-31
lines changed

NON-AUTOTOOLS-BUILD

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ example.
171171
src/config.h) . Compile src/pcre2test.c; don't forget -DHAVE_CONFIG_H if
172172
necessary, but do NOT define PCRE2_CODE_UNIT_WIDTH. Then link with the
173173
appropriate library/ies. If you compiled an 8-bit library, pcre2test also
174-
needs the pcre2posix wrapper library.
174+
needs the pcre2posix wrapper library when linking.
175175

176176
(9) Run pcre2test on the testinput files in the testdata directory, and check
177177
that the output matches the corresponding testoutput files. There are

configure.ac

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ m4_define(libpcre2_posix_version, [3:6:0])
2222
# NOTE: The CMakeLists.txt file searches for the above variables in the first
2323
# 50 lines of this file. Please update that if the variables above are moved.
2424

25-
AC_PREREQ([2.60])
25+
AC_PREREQ([2.62])
2626
AC_INIT([PCRE2],pcre2_major.pcre2_minor[]pcre2_prerelease,[],[pcre2])
2727
AC_CONFIG_SRCDIR([src/pcre2.h.in])
2828
AM_INIT_AUTOMAKE([dist-bzip2 dist-zip foreign])

doc/pcre2api.3

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.TH PCRE2API 3 "19 July 2025" "PCRE2 10.47-DEV"
1+
.TH PCRE2API 3 "05 October 2025" "PCRE2 10.47-DEV"
22
.SH NAME
33
PCRE2 - Perl-compatible regular expressions (revised API)
44
.sp
@@ -1296,7 +1296,7 @@ documentation for more details.
12961296
.sp
12971297
PCRE2_CONFIG_JITTARGET
12981298
.sp
1299-
The \fIwhere\fP argument should point to a buffer that is at least 48 code
1299+
The \fIwhere\fP argument should point to a buffer that is at least 64 code
13001300
units long. (The exact length required can be found by calling
13011301
\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with a
13021302
string that contains the name of the architecture for which the JIT compiler is
@@ -4598,6 +4598,6 @@ Cambridge, England.
45984598
.rs
45994599
.sp
46004600
.nf
4601-
Last updated: 19 July 2025
4601+
Last updated: 05 October 2025
46024602
Copyright (c) 1997-2024 University of Cambridge.
46034603
.fi

doc/pcre2build.3

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -134,8 +134,8 @@ UTF support allows the libraries to process character code points up to
134134
0x10ffff in the strings that they handle. Unicode support also gives access to
135135
the Unicode properties of characters, using pattern escapes such as \eP, \ep,
136136
and \eX. Only the general category properties such as \fILu\fP and \fINd\fP,
137-
script names, and some bi-directional properties are supported. Details are
138-
given in the
137+
script names, and some bi-directional and binary properties are supported.
138+
Details are given in the
139139
.\" HREF
140140
\fBpcre2pattern\fP
141141
.\"
@@ -152,8 +152,8 @@ request this by starting with (*UCP).
152152
.sp
153153
The \eC escape sequence, which matches a single code unit, even in a UTF mode,
154154
can cause unpredictable behaviour because it may leave the current matching
155-
point in the middle of a multi-code-unit character. The application can lock it
156-
out by setting the PCRE2_NEVER_BACKSLASH_C option when calling
155+
point in the middle of a multi-code-unit character. The application can lock
156+
it out by setting the PCRE2_NEVER_BACKSLASH_C option when calling
157157
\fBpcre2_compile()\fP. There is also a build-time option
158158
.sp
159159
--enable-never-backslash-C
@@ -517,7 +517,7 @@ use), some extra configuration may be necessary. The INSTALL file for
517517
If your environment has not been set up so that an appropriate library is
518518
automatically included, you may need to add something like
519519
.sp
520-
LIBS="-ncurses"
520+
LIBS="-lncurses"
521521
.sp
522522
immediately before the \fBconfigure\fP command.
523523
.

src/pcre2_jit_compile.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6888,7 +6888,7 @@ if (JIT_HAS_FAST_FORWARD_CHAR_SIMD && (common->nltype == NLTYPE_FIXED || common-
68886888
OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1));
68896889
quit = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, CHAR_CR);
68906890
}
6891-
else
6891+
else
68926892
{
68936893
fast_forward_char_simd(common, common->newline, common->newline, 0);
68946894

src/pcre2_jit_simd_inc.h

Lines changed: 38 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ return CMP(SLJIT_NOT_EQUAL, reg, 0, SLJIT_IMM, 0xdc00);
101101
}
102102
#endif
103103

104-
#endif /* SLJIT_CONFIG_X86 || SLJIT_CONFIG_S390X */
104+
#endif /* SLJIT_CONFIG_X86 || SLJIT_CONFIG_ARM_64 || SLJIT_CONFIG_S390X || SLJIT_CONFIG_LOONGARCH_64 */
105105

106106
#if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86)
107107

@@ -229,7 +229,14 @@ switch (step)
229229
}
230230
}
231231

232+
/* The AVX2 code path is currently disabled.
232233
#define JIT_HAS_FAST_FORWARD_CHAR_SIMD (sljit_has_cpu_feature(SLJIT_HAS_SIMD))
234+
*/
235+
#if defined(SLJIT_CONFIG_X86_64) && SLJIT_CONFIG_X86_64
236+
#define JIT_HAS_FAST_FORWARD_CHAR_SIMD 1
237+
#else
238+
#define JIT_HAS_FAST_FORWARD_CHAR_SIMD (sljit_has_cpu_feature(SLJIT_HAS_FPU))
239+
#endif
233240

234241
static void fast_forward_char_simd(compiler_common *common, PCRE2_UCHAR char1, PCRE2_UCHAR char2, sljit_s32 offset)
235242
{
@@ -247,10 +254,10 @@ struct sljit_jump *quit;
247254
struct sljit_jump *partial_quit[2];
248255
vector_compare_type compare_type = vector_compare_match1;
249256
sljit_s32 tmp1_reg_ind = sljit_get_register_index(SLJIT_GP_REGISTER, TMP1);
250-
sljit_s32 data_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR0);
251-
sljit_s32 cmp1_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR1);
252-
sljit_s32 cmp2_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR2);
253-
sljit_s32 tmp_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR3);
257+
sljit_s32 data_ind = sljit_get_register_index(reg_type, SLJIT_VR0);
258+
sljit_s32 cmp1_ind = sljit_get_register_index(reg_type, SLJIT_VR1);
259+
sljit_s32 cmp2_ind = sljit_get_register_index(reg_type, SLJIT_VR2);
260+
sljit_s32 tmp_ind = sljit_get_register_index(reg_type, SLJIT_VR3);
254261
sljit_u32 bit = 0;
255262
int i;
256263

@@ -366,7 +373,14 @@ if (common->utf && offset > 0)
366373
#endif
367374
}
368375

376+
/* The AVX2 code path is currently disabled.
369377
#define JIT_HAS_FAST_REQUESTED_CHAR_SIMD (sljit_has_cpu_feature(SLJIT_HAS_SIMD))
378+
*/
379+
#if defined(SLJIT_CONFIG_X86_64) && SLJIT_CONFIG_X86_64
380+
#define JIT_HAS_FAST_REQUESTED_CHAR_SIMD 1
381+
#else
382+
#define JIT_HAS_FAST_REQUESTED_CHAR_SIMD (sljit_has_cpu_feature(SLJIT_HAS_FPU))
383+
#endif
370384

371385
static jump_list *fast_requested_char_simd(compiler_common *common, PCRE2_UCHAR char1, PCRE2_UCHAR char2)
372386
{
@@ -381,10 +395,10 @@ struct sljit_jump *quit;
381395
jump_list *not_found = NULL;
382396
vector_compare_type compare_type = vector_compare_match1;
383397
sljit_s32 tmp1_reg_ind = sljit_get_register_index(SLJIT_GP_REGISTER, TMP1);
384-
sljit_s32 data_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR0);
385-
sljit_s32 cmp1_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR1);
386-
sljit_s32 cmp2_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR2);
387-
sljit_s32 tmp_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR3);
398+
sljit_s32 data_ind = sljit_get_register_index(reg_type, SLJIT_VR0);
399+
sljit_s32 cmp1_ind = sljit_get_register_index(reg_type, SLJIT_VR1);
400+
sljit_s32 cmp2_ind = sljit_get_register_index(reg_type, SLJIT_VR2);
401+
sljit_s32 tmp_ind = sljit_get_register_index(reg_type, SLJIT_VR3);
388402
sljit_u32 bit = 0;
389403
int i;
390404

@@ -472,7 +486,14 @@ return not_found;
472486

473487
#ifndef _WIN64
474488

489+
/* The AVX2 code path is currently disabled.
475490
#define JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD (sljit_has_cpu_feature(SLJIT_HAS_SIMD))
491+
*/
492+
#if defined(SLJIT_CONFIG_X86_64) && SLJIT_CONFIG_X86_64
493+
#define JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD 1
494+
#else
495+
#define JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD (sljit_has_cpu_feature(SLJIT_HAS_FPU))
496+
#endif
476497

477498
static void fast_forward_char_pair_simd(compiler_common *common, sljit_s32 offs1,
478499
PCRE2_UCHAR char1a, PCRE2_UCHAR char1b, sljit_s32 offs2, PCRE2_UCHAR char2a, PCRE2_UCHAR char2b)
@@ -489,14 +510,14 @@ sljit_u32 bit1 = 0;
489510
sljit_u32 bit2 = 0;
490511
sljit_u32 diff = IN_UCHARS(offs1 - offs2);
491512
sljit_s32 tmp1_reg_ind = sljit_get_register_index(SLJIT_GP_REGISTER, TMP1);
492-
sljit_s32 data1_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR0);
493-
sljit_s32 data2_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR1);
494-
sljit_s32 cmp1a_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR2);
495-
sljit_s32 cmp2a_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR3);
496-
sljit_s32 cmp1b_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR4);
497-
sljit_s32 cmp2b_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR5);
498-
sljit_s32 tmp1_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_VR6);
499-
sljit_s32 tmp2_ind = sljit_get_register_index(SLJIT_SIMD_REG_128, SLJIT_TMP_DEST_VREG);
513+
sljit_s32 data1_ind = sljit_get_register_index(reg_type, SLJIT_VR0);
514+
sljit_s32 data2_ind = sljit_get_register_index(reg_type, SLJIT_VR1);
515+
sljit_s32 cmp1a_ind = sljit_get_register_index(reg_type, SLJIT_VR2);
516+
sljit_s32 cmp2a_ind = sljit_get_register_index(reg_type, SLJIT_VR3);
517+
sljit_s32 cmp1b_ind = sljit_get_register_index(reg_type, SLJIT_VR4);
518+
sljit_s32 cmp2b_ind = sljit_get_register_index(reg_type, SLJIT_VR5);
519+
sljit_s32 tmp1_ind = sljit_get_register_index(reg_type, SLJIT_VR6);
520+
sljit_s32 tmp2_ind = sljit_get_register_index(reg_type, SLJIT_TMP_DEST_VREG);
500521
struct sljit_label *start;
501522
#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32
502523
struct sljit_label *restart;

testdata/testinput12

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -368,7 +368,7 @@
368368

369369
/\p{BC: Aሴ}/utf
370370

371-
# A special extra option allows excaped surrogate code points in 32-bit mode,
371+
# A special extra option allows escaped surrogate code points in 32-bit mode,
372372
# but subjects containing them must not be UTF-checked. These patterns give
373373
# errors in 16-bit mode.
374374

testdata/testoutput12-16

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1428,7 +1428,7 @@ Failed: error 146 at offset 7: malformed \P or \p sequence
14281428
Failed: error 146 at offset 9: malformed \P or \p sequence
14291429
here: \p{BC: Aሴ |<--| }
14301430

1431-
# A special extra option allows excaped surrogate code points in 32-bit mode,
1431+
# A special extra option allows escaped surrogate code points in 32-bit mode,
14321432
# but subjects containing them must not be UTF-checked. These patterns give
14331433
# errors in 16-bit mode.
14341434

testdata/testoutput12-32

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1421,7 +1421,7 @@ Failed: error 146 at offset 7: malformed \P or \p sequence
14211421
Failed: error 146 at offset 9: malformed \P or \p sequence
14221422
here: \p{BC: Aሴ |<--| }
14231423

1424-
# A special extra option allows excaped surrogate code points in 32-bit mode,
1424+
# A special extra option allows escaped surrogate code points in 32-bit mode,
14251425
# but subjects containing them must not be UTF-checked. These patterns give
14261426
# errors in 16-bit mode.
14271427

0 commit comments

Comments
 (0)