Skip to content

Commit fb3a1ee

Browse files
committed
Update for regex documentation and improved matching detection of regex names.
1 parent 64637d8 commit fb3a1ee

File tree

5 files changed

+54
-55
lines changed

5 files changed

+54
-55
lines changed

docs/cpp2/metafunctions.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,7 @@ A `cpp1_rule_of_zero` type is one that has no user-written copy/move/destructor
374374
375375
#### `regex`
376376
377-
Replaces fields in the class with regular expression objects. Each field starting with `regex` is replaced with a regular expression of the same type.
377+
Replaces fields in the class with regular expression objects. All fields named `regex` or starting with `regex_` are replaced with a regular expression of the same type.
378378
379379
``` cpp title="Regular expression example"
380380
name_matcher: @regex type
@@ -401,7 +401,6 @@ main: (args) = {
401401
402402
std::cout << "Case insensitive match: " << m.regex_no_case.search("blubabABblah").group(0) << std::endl;
403403
}
404-
405404
```
406405
407406
The regex syntax used by cppfront is the [perl syntax](https://perldoc.perl.org/perlre). Most of the syntax is available. Currently we do not support unicode characters and the syntax tokens associated with them. In [supported features](../other/regex_status.md) all the available regex syntax is listed.

docs/other/regex_status.md

+32-32
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The listings are taken from [perl regex docs](https://perldoc.perl.org/perlre).
1010
- [x] m Treat the string being matched against as multiple lines. That is, change "^" and "$" from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string.
1111
- [x] s Treat the string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
1212
- [x] x and xx Extend your pattern's legibility by permitting whitespace and comments. Details in "/x and /xx"
13-
- [x] n Prevent the grouping metacharacters () from capturing. This modifier, new in 5.22, will stop $1, $2, etc... from being filled in.
13+
- [x] n Prevent the grouping metacharacters () from capturing. This modifier will stop $1, $2, etc... from being filled in.
1414
- [ ] c keep the current position during repeated matching
1515
```
1616

@@ -54,37 +54,37 @@ The listings are taken from [perl regex docs](https://perldoc.perl.org/perlre).
5454

5555
### Character Classes and other Special Escapes __(Complete)__
5656
```
57-
- [x] [...] [1] Match a character according to the rules of the
57+
- [x] [...] Match a character according to the rules of the
5858
bracketed character class defined by the "...".
5959
Example: [a-z] matches "a" or "b" or "c" ... or "z"
60-
- [x] [[:...:]] [2] Match a character according to the rules of the POSIX
60+
- [x] [[:...:]] Match a character according to the rules of the POSIX
6161
character class "..." within the outer bracketed
6262
character class. Example: [[:upper:]] matches any
6363
uppercase character.
64-
- [x] \g1 [5] Backreference to a specific or previous group,
65-
- [x] \g{-1} [5] The number may be negative indicating a relative
66-
previous group and may optionally be wrapped in
67-
curly brackets for safer parsing.
68-
- [x] \g{name} [5] Named backreference
69-
- [x] \k<name> [5] Named backreference
70-
- [x] \k'name' [5] Named backreference
71-
- [x] \k{name} [5] Named backreference
72-
- [x] \w [3] Match a "word" character (alphanumeric plus "_", plus
64+
- [x] \g1 Backreference to a specific or previous group,
65+
- [x] \g{-1} The number may be negative indicating a relative
66+
previous group and may optionally be wrapped in
67+
curly brackets for safer parsing.
68+
- [x] \g{name} Named backreference
69+
- [x] \k<name> Named backreference
70+
- [x] \k'name' Named backreference
71+
- [x] \k{name} Named backreference
72+
- [x] \w Match a "word" character (alphanumeric plus "_", plus
7373
other connector punctuation chars plus Unicode
7474
marks)
75-
- [x] \W [3] Match a non-"word" character
76-
- [x] \s [3] Match a whitespace character
77-
- [x] \S [3] Match a non-whitespace character
78-
- [x] \d [3] Match a decimal digit character
79-
- [x] \D [3] Match a non-digit character
80-
- [x] \v [3] Vertical whitespace
81-
- [x] \V [3] Not vertical whitespace
82-
- [x] \h [3] Horizontal whitespace
83-
- [x] \H [3] Not horizontal whitespace
84-
- [x] \1 [5] Backreference to a specific capture group or buffer.
75+
- [x] \W Match a non-"word" character
76+
- [x] \s Match a whitespace character
77+
- [x] \S Match a non-whitespace character
78+
- [x] \d Match a decimal digit character
79+
- [x] \D Match a non-digit character
80+
- [x] \v Vertical whitespace
81+
- [x] \V Not vertical whitespace
82+
- [x] \h Horizontal whitespace
83+
- [x] \H Not horizontal whitespace
84+
- [x] \1 Backreference to a specific capture group or buffer.
8585
'1' may actually be any positive integer.
86-
- [x] \N [7] Any character but \n. Not affected by /s modifier
87-
- [x] \K [6] Keep the stuff left of the \K, don't include it in $&
86+
- [x] \N Any character but \n. Not affected by /s modifier
87+
- [x] \K Keep the stuff left of the \K, don't include it in $&
8888
```
8989

9090
### Assertions
@@ -95,7 +95,7 @@ The listings are taken from [perl regex docs](https://perldoc.perl.org/perlre).
9595
- [x] \Z Match only at end of string, or before newline at the end
9696
- [x] \z Match only at end of string
9797
- [ ] \G Match only at pos() (e.g. at the end-of-match position
98-
of prior m//g)
98+
of prior m//g)
9999
```
100100

101101
### Capture groups __(Complete)__
@@ -157,7 +157,7 @@ The listings are taken from [perl regex docs](https://perldoc.perl.org/perlre).
157157
### Modifiers
158158
```
159159
- [ ] p Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} are available for use after matching.
160-
- [ ] a, d, l, and u These modifiers, all new in 5.14, affect which character-set rules (Unicode, etc.) are used, as described below in "Character set modifiers".
160+
- [ ] a, d, l, and u These modifiers affect which character-set rules (Unicode, etc.) are used, as described below in "Character set modifiers".
161161
- [ ] g globally match the pattern repeatedly in the string
162162
- [ ] e evaluate the right-hand side as an expression
163163
- [ ] ee evaluate the right side as a string then eval the result
@@ -180,11 +180,11 @@ The listings are taken from [perl regex docs](https://perldoc.perl.org/perlre).
180180

181181
### Character Classes and other Special Escapes
182182
```
183-
- [ ] (?[...]) [8] Extended bracketed character class
184-
- [ ] \pP [3] Match P, named property. Use \p{Prop} for longer names
185-
- [ ] \PP [3] Match non-P
186-
- [ ] \X [4] Match Unicode "eXtended grapheme cluster"
187-
- [ ] \R [4] Linebreak
183+
- [ ] (?[...]) Extended bracketed character class
184+
- [ ] \pP Match P, named property. Use \p{Prop} for longer names
185+
- [ ] \PP Match non-P
186+
- [ ] \X Match Unicode "eXtended grapheme cluster"
187+
- [ ] \R Linebreak
188188
```
189189

190190
### Assertions
@@ -208,4 +208,4 @@ The listings are taken from [perl regex docs](https://perldoc.perl.org/perlre).
208208
- [ ] (*sr:pattern) All chars in pattern need to be of the same script.
209209
- [ ] (*atomic_script_run:pattern) Without backtracking.
210210
- [ ] (*asr:pattern) Without backtracking.
211-
```
211+
```

regression-tests/test-results/msvc-2022-c++latest/pure2-regex_10_escapes.cpp.execution

+13-13
Original file line numberDiff line numberDiff line change
@@ -9,26 +9,26 @@ Running tests_10_escapes:
99
08_y: OK regex: foo(\h)bar parsed_regex: foo(\h)bar str: foo bar result_expr: $1 expected_results
1010
09_y: OK regex: (\H)(\h) parsed_regex: (\H)(\h) str: foo bar result_expr: $1-$2 expected_results o-
1111
10_y: OK regex: (\h)(\H) parsed_regex: (\h)(\H) str: foo bar result_expr: $1-$2 expected_results -b
12-
11_y: OK regex: foo(\v+)bar parsed_regex: foo(\v+)bar str: foo
13-
12+
11_y: OK regex: foo(\v+)bar parsed_regex: foo(\v+)bar str: foo
1413

15-
bar result_expr: $1 expected_results
16-
1714

15+
bar result_expr: $1 expected_results
1816

19-
12_y: OK regex: (\V+)(\v) parsed_regex: (\V+)(\v) str: foo
20-
2117

22-
bar result_expr: $1-$2 expected_results foo-
23-
13_y: OK regex: (\v+)(\V) parsed_regex: (\v+)(\V) str: foo
24-
2518

26-
bar result_expr: $1-$2 expected_results
27-
19+
12_y: OK regex: (\V+)(\v) parsed_regex: (\V+)(\v) str: foo
20+
21+
22+
bar result_expr: $1-$2 expected_results foo-
23+
13_y: OK regex: (\v+)(\V) parsed_regex: (\v+)(\V) str: foo
24+
25+
26+
bar result_expr: $1-$2 expected_results
27+
2828

2929
-b
30-
14_y: OK regex: foo(\v)bar parsed_regex: foo(\v)bar str: foobar result_expr: $1 expected_results
31-
15_y: OK regex: (\V)(\v) parsed_regex: (\V)(\v) str: foobar result_expr: $1-$2 expected_results o-
30+
14_y: OK regex: foo(\v)bar parsed_regex: foo(\v)bar str: foobar result_expr: $1 expected_results
31+
15_y: OK regex: (\V)(\v) parsed_regex: (\V)(\v) str: foobar result_expr: $1-$2 expected_results o-
3232
16_y: OK regex: (\v)(\V) parsed_regex: (\v)(\V) str: foobar result_expr: $1-$2 expected_results -b
3333
17_y: OK regex: foo\t\n\r\f\a\ebar parsed_regex: foo\t\n\r\f\a\ebar str: foo
3434
bar result_expr: $& expected_results foo

source/reflect.h

+4-4
Original file line numberDiff line numberDiff line change
@@ -2097,22 +2097,22 @@ auto print(cpp2::impl::in<meta::type_declaration> t) -> void
20972097
auto regex_gen(meta::type_declaration& t) -> void
20982098
{
20992099
auto has_default {false};
2100-
auto prefix {"regex"};
2101-
std::string postfix {"_mod"}; // TODO: remove mod syntax when 'm.initializer()' can be '("pat", "mod")'
2100+
auto exact_name {"regex"};
2101+
auto prefix {"regex_"};
21022102
std::map<std::string,std::string> expressions {};
21032103

21042104
for ( auto& m : CPP2_UFCS(get_member_objects)(t) )
21052105
{
21062106
std::string name {CPP2_UFCS(name)(m)};
21072107

2108-
if (CPP2_UFCS(starts_with)(name, prefix))
2108+
if (CPP2_UFCS(starts_with)(name, prefix) || name == exact_name)
21092109
{
21102110
if (!(CPP2_UFCS(has_initializer)(m))) {
21112111
CPP2_UFCS(error)(t, "Regular expression must have an initializer.");
21122112
}
21132113
CPP2_UFCS(mark_for_removal_from_enclosing_type)(m);
21142114

2115-
if (name == prefix) {
2115+
if (name == exact_name) {
21162116
if (has_default) {
21172117
CPP2_UFCS(error)(t, "Type can only contain one default named regular expression.");
21182118
}

source/reflect.h2

+4-4
Original file line numberDiff line numberDiff line change
@@ -1460,22 +1460,22 @@ print: (t: meta::type_declaration) =
14601460
regex_gen: (inout t: meta::type_declaration) =
14611461
{
14621462
has_default := false;
1463-
prefix := "regex";
1464-
postfix : std::string = "_mod"; // TODO: remove mod syntax when 'm.initializer()' can be '("pat", "mod")'
1463+
exact_name := "regex";
1464+
prefix := "regex_";
14651465
expressions : std::map<std::string, std::string> = ();
14661466

14671467
for t.get_member_objects() do (inout m)
14681468
{
14691469
name: std::string = m.name();
14701470

1471-
if name.starts_with(prefix)
1471+
if name.starts_with(prefix) || name == exact_name
14721472
{
14731473
if !m.has_initializer() {
14741474
t.error("Regular expression must have an initializer.");
14751475
}
14761476
m.mark_for_removal_from_enclosing_type();
14771477

1478-
if name == prefix {
1478+
if name == exact_name {
14791479
if has_default {
14801480
t.error("Type can only contain one default named regular expression.");
14811481
}

0 commit comments

Comments
 (0)