Description
During working on PR #3109 I found description of the treatments of newlines might be wrong.
But I might be wrong. Let me know what I am missing.
From Regular expression (regex) engine:
A more subtle issue is this text from the Regular Expressions chapter: “the use of literal s or any escape sequence equivalent produces undefined results”. What that means is using a regex pattern with [^\n]+ is invalid, and indeed in glibc produces very odd results.
The description of the specification including before and after the quoted sentence is as follows.
In the functions processing regular expressions described in System Interfaces volume of POSIX.1-2017, the is regarded as an ordinary character and both a and a non-matching list can match one. In the functions processing regular expressions described in System Interfaces volume of POSIX.1-2017, the is regarded as an ordinary character and both a and a non-matching list can match one. The Shell and Utilities volume of POSIX.1-2017 specifies within the individual descriptions of those standard utilities employing regular expressions whether they permit matching of characters; if not stated otherwise, the use of literal characters or any escape sequence equivalent in either patterns or matched text produces undefined results.
It does not say "What that means is using a regex pattern with [^\n]+ is invalid". I can find a description of special treatment of in the spec.
Does this describe about an issue specific to the implementation of glibc?
And the the next sentence follows;
Those utilities (like grep) that do not allow characters to match are responsible for eliminating any from strings before matching against the RE.
In the Universal Ctags case this is similar to --regex-<LANG>
what processes input line by line. --regex-<LANG>
does
not have to care setting of REG_NEWLINE
, if I understand correctly. should be eliminated.
Never use \n in patterns for --regex-,
This is OK. But I don't understand the following senence;
and never use them in non-matching bracket expressions for --mline-regex- patterns.
First I don't understand what non-matching bracket expressions
means. Of course brackets ([
and ]
) should be paired. But I guess the sentence above means different things.
I think it is more portable to use ^
or $
than using \n
because there are variations of line-break characters.
For the experimental --_mtable-regex- you can safely use \n because that regex is not compiled with REG_NEWLINE.
We can also say we have to use \n because that regex is not compiled with REG_NEWLINE.
If I understand correctly, it is better to set REG_NEWLINE
for --_mtable-regex-<LANG>
, too.