-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Raw string handling is done both in the lexer and the parser. It is in the lexer for inside src\Compilers\CSharp\Portable\Parser\Lexer_RawStringLiteral.cs for dealing with normal non-interpolated raw string literals. And it is in the parser in src\Compilers\CSharp\Portable\Parser\LanguageParser_InterpolatedString.cs for the interpolated case.
In particular, each must determine how to handle the automatic indentation removal present in multi-line raw string literals. In the lexer, that is ScanMultiLineRawStringLiteral, ScanMultiLineRawStringLiteralLine and AddMultiLineRawStringLiteralLineContents. In the parser that is within ParseInterpolatedStringToken.
We should find a way to unify logic between the two cases. This is challenging though. In the lexer case, we have to operate over the text abstraction it has (retrieving characters from an underlying TextWindow). In the parser we use a string, and a ReadONlySpan over that string.
My recomendation is to use the Method<TData, TDataHelper>(TData data) where TDataHelper : struct, IDataHelper
pattern here. This allows one to write an algorithm that can operate on a generic piece of data, deferring to the helper struct to do any operations that are specific to the data itself (like retrieving characters) and not specific to the algorithm.
The two callers then pass in their data, and their own struct which contains the impl of those helper methods. Because it is a struct, there is no allocation overhead, and the runtime/jit will emit specialized code for each that will effectively devirtualize everything and provide near optimal codegen.