Skip to content

Commit cf74570

Browse files
committed
Minor corrections to README.md
1 parent 6d11862 commit cf74570

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@ The most notable changes to keep in mind with regard to migration are the follow
237237
* The `TokenType` and `CommentType` enums have been renamed named to `TokenKind` and `CommentKind`, respectively. Also, some of the member names have been changed.
238238
* The `Token` and `Comment` structs have been completely reworked. The `SyntaxToken` and `SyntaxComment` classes have been removed.
239239
* The `SyntaxElement` class has been removed, that is, the `Node` class has become the root of the AST node type hierarchy. (This also means that tokens and comments are not attached to the root nodes of the AST. You can obtain those via the `ParserOptions.OnToken` and `ParserOptions.OnComment` callbacks).
240-
* The `Nodes` enum has been renamed named to `NodeType`.
240+
* The `Nodes` enum has been renamed to `NodeType`.
241241
* The `Node.AssociatedData` property has been renamed to `UserData`.
242242
* The `AssignmentOperator`, `BinaryOperator` and `UnaryOperator` enums have been merged into a single enum named `Operator`. Also, some of the member names have been changed.
243243
* The `Literal` node class has been changed to only provide an `object? Value { get; }` property for accessing literal value. There are sealed subclasses for the different kinds of literals. Use those to access literal values in a type-safe (and more efficient) manner.
@@ -306,8 +306,8 @@ However, because of the fundamental differences between the JS and .NET regex en
306306

307307
* Case-insensitive matching [won't always yield the same results](https://github.com/adams85/acornima/blob/488e55472113af21e31cbc24a79c18b01d23dcc7/src/Acornima/Tokenizer.RegExpParser.cs#L99). Implementing a workaround for this issue would be extremely hard, if not impossible.
308308
* The JS regex engine assigns numbers to capturing groups sequentially (regardless of the group being named or not named) but [.NET uses a different, weird approach](https://learn.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#grouping-constructs-and-regular-expression-objects): "Captures that use parentheses are numbered automatically from left to right based on the order of the opening parentheses in the regular expression, starting from 1. However, named capture groups are always ordered last, after non-named capture groups." Without some adjustments, this would totally mess up numbered backreferences and replace pattern references. So, as a workaround, the converter wraps all named capturing groups in a non-named capturing group to force .NET to include all the original capturing groups in the resulting match in the expected order. (Of course, this won't prevent named groups from being listed after the numbered ones.) If needed, the original number of groups can be obtained from the returned `RegExpParseResult` object's `ActualRegexGroupCount` property.
309-
* The characters allowed in group names differs in the two regex engines. For example a the group name `$group` is valid in JS but invalid in .NET. So, as a workaround, the converter [encodes the problematic group names](https://github.com/adams85/acornima/blob/488e55472113af21e31cbc24a79c18b01d23dcc7/src/Acornima/Tokenizer.RegExpParser.cs#L1041) to names that are valid in .NET and probably won't collide with other group names present in the pattern. For example, `$group` is encoded like `__utf8_2467726F7570`. The original group names can be obtained using the returned `RegExpParseResult` object's `GetRegexGroupName` method.
310-
* Self-referencing capturing groups like `/((a+)(\1) ?)+/` may not produce the exact same captures. [`RegexOptions.ECMAScript` is supposed to cover this](https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-options#ecmascript-matching-behavior), however even the MSDN example doesn't produce the same matches. (As a side note, `RegexOptions.ECMAScript` is kinda a false promise, it can't even get some basic cases right by itself.)
309+
* The characters allowed in group names differs in the two regex engines. For example, the group name `$group` is valid in JS but invalid in .NET. So, as a workaround, the converter [encodes the problematic group names](https://github.com/adams85/acornima/blob/488e55472113af21e31cbc24a79c18b01d23dcc7/src/Acornima/Tokenizer.RegExpParser.cs#L1041) to names that are valid in .NET and probably won't collide with other group names present in the pattern. For example, `$group` is encoded like `__utf8_2467726F7570`. The original group names can be obtained using the returned `RegExpParseResult` object's `GetRegexGroupName` method.
310+
* Self-referencing capturing groups like `/((a+)(\1) ?)+/` may not produce the exact same captures. [`RegexOptions.ECMAScript` is supposed to cover this](https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-options#ecmascript-matching-behavior), however even the MSDN example doesn't produce the same matches. (As a side note, `RegexOptions.ECMAScript` is kind of a false promise, it can't even get some basic cases right by itself.)
311311
* Similarily, repeated nested groups like `/((a+)?(b+)?(c))*/` may produce different captures for the groups. ([JS has an overwrite behavior](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Capturing_group#description) while .NET doesn't).
312312
* .NET treats forward references like `\1(\w)` differently than JS and it's not possible to convert this kind of patterns reliably. (The converter could make some patterns work by rewriting them to something like `(?:)(\w)` but there are cases where even this wouldn't work.)
313313
* Unicode mode issues:

0 commit comments

Comments
 (0)