feat(yaml_parser): lex plain token #5870

vohoanglong0107 · 2025-05-03T14:46:42Z

Summary

Allow the YAML lexer to lex plain token.

Test Plan

Added new tests for the lexer.

codspeed-hq · 2025-05-03T15:28:17Z

CodSpeed Performance Report

Merging #5870 will not alter performance

_{Comparing vohoanglong0107:lex_plain_token_with_context (50d8ce3) with main (14551ba)}

Summary

✅ 95 untouched benchmarks

dyc3

Love to see work on yaml! While I have mostly compliments, I do have some things that need to be addressed.

dyc3 · 2025-05-03T23:42:37Z

crates/biome_yaml_parser/src/lexer/mod.rs

+// https://yaml.org/spec/1.2.2/#rule-ns-char
+fn is_non_space_char(c: u8) -> bool {
+    !is_space(c) && !is_break(c)
+}
+
+// https://yaml.org/spec/1.2.2/#rule-s-white
+fn is_space(c: u8) -> bool {
+    c == b' ' || c == b'\t'
+}
+
+// https://yaml.org/spec/1.2.2/#rule-b-char
+fn is_break(c: u8) -> bool {
+    c == b'\n' || c == b'\r'
+}


I like the attention to detail here with the spec links. It's nice that these things are well defined.

dyc3 · 2025-05-03T23:51:12Z

crates/biome_yaml_parser/src/lexer/mod.rs

+    // Inside block key context
+    BlockKey,
+    // Outside flow context
+    FlowOut,
+    // Inside flow context
+    FlowIn,
+    // Inside flow key context
+    FlowKey,
+}


These should be doc comments, and these should have some code samples to visually explain what these mean.

dyc3 · 2025-05-03T23:54:02Z

crates/biome_yaml_parser/src/lexer/tests.rs

@@ -12,17 +12,16 @@ use std::time::Duration;
 // Assert the result of lexing a piece of source code,
 // and make sure the tokens yielded are fully lossless and the source can be reconstructed from only the tokens
 macro_rules! assert_lex {
-    ($src:expr, $($kind:ident:$len:expr $(,)?)*) => {{
+    ($context:expr, $src:expr, $($kind:ident:$len:expr $(,)?)*) => {{


I see you've taken inspiration from the HTML lexer tests. :)

crates/biome_yaml_parser/src/lexer/tests.rs

dyc3 · 2025-05-04T00:03:30Z

crates/biome_yaml_parser/src/token_source.rs

        source
    }

-    fn next_non_trivia_token(&mut self, first_token: bool) {
+    fn next_non_trivia_token(&mut self, first_token: bool, context: YamlLexContext) {


nit: the equivalent for this in the html parser has these arguments reversed: context: HtmlLexContext, first_token: bool and it would be better if we stay consistent. (so if the html parser is actually the one at fault we would change it to match the other languages, of course)

dyc3 · 2025-05-04T00:08:54Z

crates/biome_yaml_parser/src/token_source.rs

    fn bump(&mut self) {
        if self.current() != EOF {
-            self.next_non_trivia_token(false)
+            self.next_non_trivia_token(false, YamlLexContext::Regular)
        }
    }


not really related to this PR, but this should be a call to bump_with_context

fn bump(&mut self) { self.bump_with_context(HtmlLexContext::Regular) }

Right, I have another PR in store which implements BumpWithContext for the Token Source. Let me address this then.

chansuke · 2025-05-04T04:55:49Z

crates/biome_yaml_parser/src/lexer/mod.rs

+
+fn is_indicator(c: u8) -> bool {


Suggested change

fn is_indicator(c: u8) -> bool {

// https://yaml.org/spec/1.2.2/#53-indicator-characters

fn is_indicator(c: u8) -> bool {

Would it be useful to add the YAML spec link here as well?

github-actions bot added the A-Parser Area: parser label May 3, 2025

feat(yaml_parser): lex plain token

9627d10

vohoanglong0107 force-pushed the lex_plain_token_with_context branch from d324c06 to 9627d10 Compare May 3, 2025 15:32

vohoanglong0107 marked this pull request as ready for review May 3, 2025 23:35

vohoanglong0107 requested review from a team May 3, 2025 23:36

dyc3 requested changes May 4, 2025

View reviewed changes

chansuke reviewed May 4, 2025

View reviewed changes

vohoanglong0107 added 2 commits May 4, 2025 14:54

doc(yaml_parser): add doc comments to lex context

614b015

refactor(yaml_parser): reorder next_non_trivia_token args

50d8ce3

vohoanglong0107 requested a review from dyc3 May 4, 2025 05:56

dyc3 approved these changes May 4, 2025

View reviewed changes

vohoanglong0107 merged commit 345d2dc into biomejs:main May 4, 2025
13 checks passed

vohoanglong0107 deleted the lex_plain_token_with_context branch May 5, 2025 01:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(yaml_parser): lex plain token #5870

feat(yaml_parser): lex plain token #5870

vohoanglong0107 commented May 3, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented May 3, 2025 •

edited

Loading

Uh oh!

dyc3 left a comment

Uh oh!

dyc3 May 3, 2025

Uh oh!

dyc3 May 3, 2025

Uh oh!

dyc3 May 3, 2025

Uh oh!

Uh oh!

dyc3 May 4, 2025

Uh oh!

dyc3 May 4, 2025

Uh oh!

vohoanglong0107 May 4, 2025

Uh oh!

chansuke May 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(yaml_parser): lex plain token #5870

feat(yaml_parser): lex plain token #5870

Conversation

vohoanglong0107 commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

codspeed-hq bot commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #5870 will not alter performance

Summary

Uh oh!

dyc3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vohoanglong0107 commented May 3, 2025 •

edited

Loading

codspeed-hq bot commented May 3, 2025 •

edited

Loading