feature/implement lexer - part 2 #3

yarkhinephyo · 2026-01-28T04:04:02Z

What is in the PR

Implement part 2 of lexer. From this paper.

Tokens in the table implemented in the PR

CELL - Cell reference $? [A-Z]+ $? [1-9][0-9]*
HORIZONTAL-RANGE - Range of rows $? [0-9]+ : $? [0-9]+
VERTICAL-RANGE - Range of columns $? [A-Z]+ : $? [A-Z]+

How is it tested

For each entry, added unit tests in tests/lexer/test_**.rs.

cargo test

qwoprocks · 2026-01-28T13:17:41Z

src/bindings/lexer.rs

+    fn backtrack_if_needed<F, T>(&mut self, f: F) -> Option<T>
+    where
+        F: FnOnce(&mut Self) -> Option<T>,
+    {
+        let saved = self.position;
+        let result = f(self);
+        if result.is_none() {
+            self.position = saved;
+        }
+        result
+    }


I feel like this is a parser job, not a lexer job

No i don't think so. The paper implies these are still lexical tokens (grammer not as much ambiguity)

tests/lexer/test_cell.rs

qwoprocks · 2026-01-28T13:24:43Z

src/bindings/lexer.rs

+                // Could be a cell reference like $A$1 or range like $A:$B or $1:$10
+                // Try cell/vertical range first
+                if let Some(token) = self.try_read_cell_or_vertical_range() {
+                    Ok(token)
+                } else if let Some(token) = self.try_read_horizontal_range() {
+                    Ok(token)
+                } else {
+                    // Invalid $ usage
+                    let c = self.current().unwrap_or('$');
+                    Err(LexerError::UnexpectedChar(c))
+                }
+            }
+            Some(c) if c.is_ascii_uppercase() => {
+                // Try cell/vertical range first (e.g., A1, A:Z)
+                if let Some(token) = self.try_read_cell_or_vertical_range() {
+                    Ok(token)
+                } else {
+                    // Try identifier for TRUE/FALSE
+                    let ident = self.read_identifier();
+                    match ident.to_uppercase().as_str() {
+                        "TRUE" => Ok(Token::Bool(true)),
+                        "FALSE" => Ok(Token::Bool(false)),
+                        _ => Err(LexerError::UnexpectedChar(c)),
+                    }
+                }


Actually yeah I feel like this is a parser job

Or maybe not, but I think the backtracking can be removed by doing some sort of precedence, see: https://github.com/spreadsheetlab/XLParser/blob/master/src/XLParser/ExcelFormulaGrammar.cs

^above library references the paper too/the paper also references that library

I think that code is just defining some grammer and letting some other library to do lexing and parsing

var assembly = typeof(ExcelFormulaGrammar).GetTypeInfo().Assembly;

I think my naming maybe bad? By "backtrack" function name, I was just implementing kind of like context management in python where the position gets reset if there was no match.

But yes time complexity is higher but we could get code like this which would be a little more maintainable and maybe easier to reason about for priorities;

fn parse_reference(&mut self) -> Token { if let Some(cell) = self.try_read_cell() { return cell; } if let Some(name) = self.try_read_named_range() { return name; } <more tokens depending on priorities> ... }

If we are trying for pure performance, i can refactor it though

Not sure what is the best way, leaning towards cleaner code. The last time I did was just writing regex and passing to a library for lexing

Added cell and ranges

8b17db1

yarkhinephyo requested a review from qwoprocks January 28, 2026 04:04

yarkhinephyo added 5 commits January 27, 2026 20:05

format

1706ca2

rename backtrack helper

86bb4d2

refactor return type

9ae4ac6

Add test cell row zero

04c4e12

add more tests

22b0804

qwoprocks requested changes Jan 28, 2026

View reviewed changes

yarkhinephyo added 2 commits January 30, 2026 23:04

Add more tests

e32587b

add unit tests

d477f5e

yarkhinephyo requested a review from qwoprocks January 31, 2026 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/implement lexer - part 2 #3

feature/implement lexer - part 2 #3

Uh oh!

yarkhinephyo commented Jan 28, 2026

Uh oh!

qwoprocks Jan 28, 2026

Uh oh!

yarkhinephyo Jan 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

qwoprocks Jan 28, 2026

Uh oh!

qwoprocks Jan 28, 2026

Uh oh!

yarkhinephyo Jan 31, 2026

Uh oh!

yarkhinephyo Jan 31, 2026

Uh oh!

yarkhinephyo Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feature/implement lexer - part 2 #3

Are you sure you want to change the base?

feature/implement lexer - part 2 #3

Uh oh!

Conversation

yarkhinephyo commented Jan 28, 2026

What is in the PR

How is it tested

Uh oh!

qwoprocks Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

yarkhinephyo Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qwoprocks Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

qwoprocks Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

yarkhinephyo Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

yarkhinephyo Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

yarkhinephyo Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yarkhinephyo Jan 31, 2026 •

edited

Loading