docs, lregex: treatment of newlines

During working on PR #3109 I found description of the treatments of newlines might be wrong.
But I might be wrong. Let me know what I am missing.

From [Regular expression (regex) engine](https://docs.ctags.io/en/doc-revise/optlib.html#regular-expression-regex-engine):

> A more subtle issue is this text from the Regular Expressions chapter: “the use of literal <newline>s or any escape sequence equivalent produces undefined results”. What that means is using a regex pattern with [^\n]+ is invalid, and indeed in glibc produces very odd results.

The description of [the specification](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html) including before and after the quoted sentence is as follows.

> In the functions processing regular expressions described in System Interfaces volume of POSIX.1-2017, the <newline> is regarded as an ordinary character and both a <period> and a non-matching list can match one. In the functions processing regular expressions described in System Interfaces volume of POSIX.1-2017, the <newline> is regarded as an ordinary character and both a <period> and a non-matching list can match one. The Shell and Utilities volume of POSIX.1-2017 specifies within the individual descriptions of those standard utilities employing regular expressions whether they permit matching of <newline> characters; if not stated otherwise, the use of literal <newline> characters or any escape sequence equivalent in either patterns or matched text produces undefined results.

It does not say "_What that means is using a regex pattern with [^\n]+ is invalid_". I can find a description of special treatment of <newline> in the spec.
Does this describe about an issue specific to the implementation of glibc?

And the the next sentence follows;

> Those utilities (like grep) that do not allow <newline> characters to match are responsible for eliminating any <newline> from strings before matching against the RE.

In the Universal Ctags case this is similar to `--regex-<LANG>` what processes input line by line. `--regex-<LANG>` does
not have to care setting of `REG_NEWLINE`, if I understand correctly.  <newline> should be eliminated.

> Never use \n in patterns for --regex-<LANG>,

This is OK. But I don't understand the following senence;

> and never use them in non-matching bracket expressions for --mline-regex-<LANG> patterns.

First I don't understand what `non-matching bracket expressions`  means. Of course brackets (`[` and `]`) should be paired. But I guess the sentence above means different things.

I think it is more portable to use `^` or `$` than using `\n` because there are variations of line-break characters.

> For the experimental --_mtable-regex-<LANG> you can safely use \n because that regex is not compiled with REG_NEWLINE.

We can also say we have to use \n because that regex is not compiled with REG_NEWLINE.
If I understand correctly, it is better to set `REG_NEWLINE` for `--_mtable-regex-<LANG>`, too.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs, lregex: treatment of newlines #3110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docs, lregex: treatment of newlines #3110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions