-
-
Notifications
You must be signed in to change notification settings - Fork 579
feat(yaml_parser): lex plain token #5870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(yaml_parser): lex plain token #5870
Conversation
CodSpeed Performance ReportMerging #5870 will not alter performanceComparing Summary
|
d324c06
to
9627d10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love to see work on yaml! While I have mostly compliments, I do have some things that need to be addressed.
// https://yaml.org/spec/1.2.2/#rule-ns-char | ||
fn is_non_space_char(c: u8) -> bool { | ||
!is_space(c) && !is_break(c) | ||
} | ||
|
||
// https://yaml.org/spec/1.2.2/#rule-s-white | ||
fn is_space(c: u8) -> bool { | ||
c == b' ' || c == b'\t' | ||
} | ||
|
||
// https://yaml.org/spec/1.2.2/#rule-b-char | ||
fn is_break(c: u8) -> bool { | ||
c == b'\n' || c == b'\r' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the attention to detail here with the spec links. It's nice that these things are well defined.
// Inside block key context | ||
BlockKey, | ||
// Outside flow context | ||
FlowOut, | ||
// Inside flow context | ||
FlowIn, | ||
// Inside flow key context | ||
FlowKey, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be doc comments, and these should have some code samples to visually explain what these mean.
@@ -12,17 +12,16 @@ use std::time::Duration; | |||
// Assert the result of lexing a piece of source code, | |||
// and make sure the tokens yielded are fully lossless and the source can be reconstructed from only the tokens | |||
macro_rules! assert_lex { | |||
($src:expr, $($kind:ident:$len:expr $(,)?)*) => {{ | |||
($context:expr, $src:expr, $($kind:ident:$len:expr $(,)?)*) => {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you've taken inspiration from the HTML lexer tests. :)
@@ -76,7 +75,7 @@ fn losslessness(string: String) -> bool { | |||
let mut lexer = YamlLexer::from_str(&cloned); | |||
let mut tokens = vec![]; | |||
|
|||
while lexer.next_token(()) != EOF { | |||
while lexer.next_token(YamlLexContext::default()) != EOF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change correct? I'm getting mixed signals from other occurrences of this test for other languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, could you elaborate on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, the change doesn't seem correct. We don't do this in other lexer tests, which means there's something incorrect in the implementation of the lexer
source | ||
} | ||
|
||
fn next_non_trivia_token(&mut self, first_token: bool) { | ||
fn next_non_trivia_token(&mut self, first_token: bool, context: YamlLexContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the equivalent for this in the html parser has these arguments reversed: context: HtmlLexContext, first_token: bool
and it would be better if we stay consistent. (so if the html parser is actually the one at fault we would change it to match the other languages, of course)
fn bump(&mut self) { | ||
if self.current() != EOF { | ||
self.next_non_trivia_token(false) | ||
self.next_non_trivia_token(false, YamlLexContext::Regular) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really related to this PR, but this should be a call to bump_with_context
fn bump(&mut self) {
self.bump_with_context(HtmlLexContext::Regular)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I have another PR in store which implements BumpWithContext for the Token Source. Let me address this then.
|
||
fn is_indicator(c: u8) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn is_indicator(c: u8) -> bool { | |
// https://yaml.org/spec/1.2.2/#53-indicator-characters | |
fn is_indicator(c: u8) -> bool { |
Would it be useful to add the YAML spec link here as well?
Summary
Allow the YAML lexer to lex plain token.
Test Plan
Added new tests for the lexer.