[SUGGESTION] Simpler interpolated raw string literals #648
Replies: 8 comments 1 reply
-
I think I fully support this change, would like to hear others' opinion on this |
Beta Was this translation helpful? Give feedback.
-
Thanks @AbhinavK00. I need to explain that non-interpolated string literals are also somehow against the goal of general capture syntax as mentioned in @hsutter's issue comment:
Because non-interpolated string literals will break the rule of
False capture |
Beta Was this translation helpful? Give feedback.
-
I like how
This made me wonder how you'd interpolate in injected code.
|
Beta Was this translation helpful? Give feedback.
-
Yes.
If I understand your example correctly, it will be evaluated only once, so:
|
Beta Was this translation helpful? Give feedback.
-
E.g., \"Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum." After actually writing it down, I realize it's silly. |
Beta Was this translation helpful? Give feedback.
-
If we put another Alternatively syntax
Consider how string literals in old syntax look different because of parenthesis and prefixes, while they look uniform in new syntax (it would leads to guidance improvement, because no one would ask why parenthesis are not a part of string or what is the meaning of prefixes):
It's a simple rule to teach newbie programmers in which sensitivity of string literals will be decreased for each additional x: = ''[{"Here (variable)$ is not interpolated!"}]''; So
I have to mention syntax
In my opinion, |
Beta Was this translation helpful? Give feedback.
-
Raw string literals would end at
By comparing the ending quote of them, we now realize |
Beta Was this translation helpful? Give feedback.
-
Alternatively the order of characters at the opening quote and the closing quote must be the same if they are identifiers (sequence of letters and numbers) and numbers: x: = 'something^([{another100<"THIS IS THE STRING VALUE!">another100}])^something';
// x == "THIS IS THE STRING VALUE!" In that way, it would make it readable to write identifiers and numbers within opening and closing quotes. Compare how it would be more readable than // $R"abc{xyz[100(...)abc{xyz[100";
x: = 'abc{xyz[100"..."100]xyz}abc';
// $R"{[abc100(...){[abc100";
y: = '{[abc100"..."abc100]}';
// $R"abc*#[(...)abc*#[";
z: = 'abc*#["..."]#*abc'; But for example, Edits
|
Beta Was this translation helpful? Give feedback.
-
Preface
I think I should explain in detail about what options are possible for string literals.
If you restrict
"
to be an escape sequence in character literals, e.g. programmers have to write'\"'
instead of'"'
, in addition to disallow empty (multiple) single quotes'...'
,''...''
,'''...'''
and etc, then the following quotes are all available syntaxes for string literals without any conflict or ambiguity for both C++2 compiler and programmers:The
?
,!
,@
and#
in the last four lines, can be any character except'
,"
,\
and new-line (because if you allow'
then it will conflict with''...''
, and if you allow"
then it will conflict with'"..."'
, and if you allow\
then it's ambiguous with\"
in character literals), if the character is an opening bracket at the opening quote, it should be the corresponding closing bracket at the closing quote, and the order of characters have to be reversed at the closing quote, e.g.'x["text"]x'
.If you disallow to place an empty string literal side-by-side of another string literal, in addition to disallow empty (at least triple) double quotes
"""..."""
,""""...""""
,"""""..."""""
and etc, then the following quotes are also available syntaxes for string literals without any conflict or ambiguity for both C++2 compiler and programmers:What is the current status of above quotes?
First let's consider numbers (2) to (4) (e.g.
''...''
) and (11) to (14) (e.g."""..."""
):''''
is not an empty''...''
, but it's just an opening quote with four'
s.'''''
doesn't contain'
inside''...''
, but it's just an opening quote with five'
s.'''text'''
doesn't contain'text'
inside''...''
, but it containstext
inside'''...'''
.Therefore to solve the above limits, we may allow optional white-spaces around the content of string literals. By the way, we can explore other alternative quotes.
Now, numbers (6) (e.g.
'"..."'
) and (7) to (10) (e.g.'?"..."?'
) are left for us. The good news about these quotes are that their opening syntax (e.g.'"...
) is different from their closing syntax (e.g...."'
), so:'""'
is an empty'"..."'
.'"""'
contains"
inside'"..."'
, and'"'"'
contains'
inside'"..."'
.'""text""'
contains"text"
inside'"..."'
, and'"'text'"'
contains'text'
inside'"..."'
.NOTE 1
Numbers (7) to (10) (e.g.
'?"..."?'
) can have additional characters in the opening and closing quotes, other than that they are similar to number (6) (e.g.'"..."'
). These additional characters are similar toR"?(...)?"
in C++1, except:'abc"text"cba'
containstext
inside'abc"..."cba'
. Alternatively identifiers and numbers within the closing quote may keep their order as described in this comment (recommended as it improves readability).'[(<{"text"}>)]'
containstext
inside'[(<{"..."}>)]'
.Suggestion Detail
This is not a new issue from me. I have a similar issue before but it was cluttered in many replays in this issue, and I felt I should summerize my suggestion here.
I have to mention that my suggestion ...
Currently
$R
prefix is used to quote interpolated raw string literals in C++2, e.g.$R"?(text)?"
. ButR
prefix is used to quote non-interpolated raw string literals, e.g.R"?(text)?"
. I suggest to completely remove the prefixes, e.g.'?"text"?'
.$R"?(...)?"
is a powerful way to have interpolated raw string literals but it's possible to go further and make its syntax simpler and smaller. The whole porpose of my suggestion is to transform$R"?(...)?"
to'?"..."?'
without any additional changes (see NOTE 1):Why do I suggest this change?
$R"(...)"
is a little verbose for most of the time that we just want to disable escape sequences and be able to simply write single quotes'
and double quotes"
inside string literals. Using'"..."'
is more readable and more convenient with less typing than$R"(...)"
to start an interpolated raw string literal.I have to mention that programmers are familiar with writing strings in quotes such as
'"..."'
, but$R"(...)"
is a little further than that and they must learn why parenthesis are not a part of content, and what is a prefix and how it can be combined with unicode prefixes.Is there any exprience, data or working implementation available?
My suggestion is a small change. It is almost
$R"?(...)?"
without$R
prefix (see NOTE 1).Is there any additional suggestion?
I additionally suggest to unify interpolated and non-interpolated string literals instead of introducing different string literals for each of them, I suggest to have a way to disable captures in string literals. The pattern
(...)$
captures a variable in string literals. It is complex enough that we don't often need to disable it, therefore we don't need to devote a different string literal to it.To disable the capture pattern
(expr)$
, I introduce a new False Capture pattern(expr)...\$
that doesn't capture anything. We can add a back-slash before dollar sign, so the value of"(...)\$"
is equal to(...)$
. Also we can add more back-slashes before dollar sign, so the value of"(...)\\$"
is equal to(...)\$
. Each time we add a back-slash we get another one. Programmers are already familiar with escape sequences, this way is similar to escape sequence\$
, but I should mention that escape sequence\\
(and other escape seqences too) doesn't have a meaning inside false capture pattern"(...)\\$"
, therefore each additional back-slash is excatly added to the value.In a nutshell, C++2 will have the following patterns in string literals:
(expr)$
is equal to the value ofexpr
.(expr)...\$
is equal to the value(expr)...$
. Only back-slash is allowed in place of...
after)
and before$
. If you add any other character except back-slash in place of...
, then the whole pattern is violated, and it will not be a capture or false capture.Finally there will be two string literals in C++2:
"..."
for non-raw string literals. It supports escape sequences.'"..."'
for raw string literals, also'?"..."?'
,'?!"..."!?'
,'?!@"..."@!?'
and etc, which the?
,!
,@
and ... can be any character except'
,"
,\
and new-line. It doesn't support escape sequences, on the other hand, its content can be broken into multiple lines (see NOTE 1).And we can capture or don't capture in the same string literal:
In the above example, a programmer have to determine if a string literal is interpolated or non-interpolated (as you see in the first line), then he can think about if
(user)$
is a capture or is not a capture. But using false captures (as you see in the second line), makes it obvious that(user)\$
is not a capture.This is a regular expression example:
As you see in the above example, without a back-slash before dollar sign (e.g.
("hello")$
) a programmer may think it's a capture in C++2 but infact it's a capture in regular expressions. Therefore using false captures (e.g.("hello")\$
) helps programmers to easily distinguish captures in C++2 and captures in regular expressions, and it brings a more readable code when dealing with regular expressions.I mean completely disabling captures via non-interpolated string literals, may lead to less readable code, becuase a programmer have to determine if a string literal is interpolated or non-interpolated, then he can think about how to read the content of the string literal.
In this way, C++2 only have two string literals and we can control the capture anytime in a single string literal.
Edits
Beta Was this translation helpful? Give feedback.
All reactions