-
-
Notifications
You must be signed in to change notification settings - Fork 35
Description
A few "valid" cases from toml-test currently fail:
FAIL valid/comment/nonascii
System.InvalidOperationException: The document has errors: (1,17) : error : The character `�` is an invalid UTF8 character
at Tomlyn.Model.TomlTable.From(DocumentSyntax documentSyntax)
at Tomlyn.Toml.ToModel(DocumentSyntax syntax)
at TomlynDecoder.Main(String[] args) in /home/martin/code/Toml/toml-test-matrix/src/cs-tomlyn/cs-tomlyn-decoder/cs-tomlyn-decoder.cs:line 17
Exit 1
input sent to parser-cmd:
# ~ � ÿ � 𐀀 �
output from parser-cmd (stderr):
System.InvalidOperationException: The document has errors: (1,17) : error : The character `�` is an invalid UTF8 character
at Tomlyn.Model.TomlTable.From(DocumentSyntax documentSyntax)
at Tomlyn.Toml.ToModel(DocumentSyntax syntax)
at TomlynDecoder.Main(String[] args) in /home/martin/code/Toml/toml-test-matrix/src/cs-tomlyn/cs-tomlyn-decoder/cs-tomlyn-decoder.cs:line 17
Exit 1
want:
FAIL valid/key/quoted-unicode
System.InvalidOperationException: The document has errors: (4,81) : error : Invalid Unicode scalar value [10FFFF]
(6,16) : error : The character `�` is an invalid UTF8 character
(7,18) : error : The character `�` is an invalid UTF8 character
at Tomlyn.Model.TomlTable.From(DocumentSyntax documentSyntax)
at Tomlyn.Toml.ToModel(DocumentSyntax syntax)
at TomlynDecoder.Main(String[] args) in /home/martin/code/Toml/toml-test-matrix/src/cs-tomlyn/cs-tomlyn-decoder/cs-tomlyn-decoder.cs:line 17
Exit 1
input sent to parser-cmd:
"\u0000" = "null"
'\u0000' = "different key"
"\u0008 \u000c \U00000041 \u007f \u0080 \u00ff \ud7ff \ue000 \uffff \U00010000 \U0010ffff" = "escaped key"
"~ � ÿ � 𐀀 �" = "basic key"
'l ~ � ÿ � 𐀀 �' = "literal key"
output from parser-cmd (stderr):
System.InvalidOperationException: The document has errors: (4,81) : error : Invalid Unicode scalar value [10FFFF]
(6,16) : error : The character `�` is an invalid UTF8 character
(7,18) : error : The character `�` is an invalid UTF8 character
at Tomlyn.Model.TomlTable.From(DocumentSyntax documentSyntax)
at Tomlyn.Toml.ToModel(DocumentSyntax syntax)
at TomlynDecoder.Main(String[] args) in /home/martin/code/Toml/toml-test-matrix/src/cs-tomlyn/cs-tomlyn-decoder/cs-tomlyn-decoder.cs:line 17
Exit 1
want:
FAIL valid/string/quoted-unicode
System.InvalidOperationException: The document has errors: (2,105) : error : Invalid Unicode scalar value [10FFFF]
(5,31) : error : The character `�` is an invalid UTF8 character
(6,33) : error : The character `�` is an invalid UTF8 character
at Tomlyn.Model.TomlTable.From(DocumentSyntax documentSyntax)
at Tomlyn.Toml.ToModel(DocumentSyntax syntax)
at TomlynDecoder.Main(String[] args) in /home/martin/code/Toml/toml-test-matrix/src/cs-tomlyn/cs-tomlyn-decoder/cs-tomlyn-decoder.cs:line 17
Exit 1
input sent to parser-cmd:
escaped_string = "\u0000 \u0008 \u000c \U00000041 \u007f \u0080 \u00ff \ud7ff \ue000 \uffff \U00010000 \U0010ffff"
not_escaped_string = '\u0000 \u0008 \u000c \U00000041 \u007f \u0080 \u00ff \ud7ff \ue000 \uffff \U00010000 \U0010ffff'
basic_string = "~ � ÿ � 𐀀 �"
literal_string = '~ � ÿ � 𐀀 �'
output from parser-cmd (stderr):
System.InvalidOperationException: The document has errors: (2,105) : error : Invalid Unicode scalar value [10FFFF]
(5,31) : error : The character `�` is an invalid UTF8 character
(6,33) : error : The character `�` is an invalid UTF8 character
at Tomlyn.Model.TomlTable.From(DocumentSyntax documentSyntax)
at Tomlyn.Toml.ToModel(DocumentSyntax syntax)
at TomlynDecoder.Main(String[] args) in /home/martin/code/Toml/toml-test-matrix/src/cs-tomlyn/cs-tomlyn-decoder/cs-tomlyn-decoder.cs:line 17
Exit 1
want:
I'm not sure why they don't fail with the toml-test integration; a little binary I wrote so I can use it with toml-test tool: https://github.com/toml-lang/toml-test-matrix/blob/main/scripts/cs-tomlyn-decoder.cs (built with: https://github.com/toml-lang/toml-test-matrix/blob/main/parsers/cs-tomlyn.zsh#L8). Can reproduce with:
% toml-test ./cs-tomlyn-decoder/bin/Release/net8.0/cs-tomlyn-decoder
As for the errors:
-
"invalid UTF-8 character" is not really accurate, as it's valid UTF-8 – it's just that U+FFFF is not currently assgined in Unicode. Looking at the code, can probably just remove the
CharHelper.IsValidUnicodeScalarValue(c)call inTomlyn.Parsing.CheckCharacter()? -
The "Invalid Unicode scalar value" error is similar: U+10FFFF is not currently assigned in Unicode. There is no requirement that
\u...and\U...escapes only encode currently assigned codepoints. Can probably just remove the IsValidUnicodeScalarValue() call here too?