refactor: decompose lexer into functional submodules#264
Merged
LunaStev merged 1 commit intowavefnd:masterfrom Jan 5, 2026
Merged
Conversation
Break down the lexer implementation into logical components to improve code organization and readability. Changes: - **New Module Structure**: - `core.rs`: `Lexer` and `Token` struct definitions and entry points. - `cursor.rs`: Low-level source navigation (`advance`, `peek`, `match_next`). - `scan.rs`: Main token dispatch logic (`next_token`). - `ident.rs`: Identifier scanning and keyword mapping. - `literals.rs`: String and character literal parsing. - `trivia.rs`: Whitespace and comment skipping. - `common.rs`: Internal shared imports. - **Integration**: - Updated `front/lexer/src/lib.rs` and `mod.rs` to expose the new structure. - Updated imports in `front/parser` to align with the refactored lexer API (explicit `use lexer::token::TokenType` where necessary). This modularization separates concerns, making the lexer easier to maintain and extend. Signed-off-by: LunaStev <luna@lunastev.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Following the modularization of the parser and codegen, this PR breaks down the monolithic
lexer.rsinto specialized submodules withinfront/lexer/src/lexer/. This reorganization separates low-level source navigation from high-level token dispatch and literal parsing, making the lexer significantly easier to maintain and extend.Key Changes
1. Lexer Modularization
The lexer logic has been split into the following functional components:
core.rs: Definitions for theLexerandTokenstructures, serving as the foundational types.cursor.rs: Implements low-level source navigation methods such asadvance(),peek(),peek_next(), andmatch_next().scan.rs: The primary entry point for tokenization, containing the mainnext_token()dispatch logic and character-level matching.ident.rs: Logic for scanning identifiers and mapping them to language keywords.literals.rs: Specialized scanning for string and character literals, including escape sequence handling.trivia.rs: Logic for skipping non-token "trivia" such as whitespace and various comment styles.common.rs: Internal shared imports and utilities used across the lexer submodules.2. Integration & API Cleanup
front/lexer/src/lib.rsandmod.rsto correctly export the new modular structure while maintaining a clean public API.front/parsercrate to align with the new lexer paths, specifically ensuringTokenTypeandTokenare correctly referenced.3. Behavioral Consistency
Lexerpublic interface remains stable to prevent breaking changes in the compiler runner.Impact
cursor.rs, while adding new keywords only involvesident.rs.