fix(lexer): handle UTF-8 characters correctly in comments #159

cakevm · 2026-01-03T12:16:22Z

Fix lexer not handling UTF-8 characters correctly in comments.
- Multi-byte UTF-8 characters (e.g., →) in comments caused incorrect span positions for subsequent tokens.
- The lexer now properly tracks byte positions instead of character indices.

Example:

// a → b
#define macro MAIN() = takes (0) returns (1) {
// a → b
x123
}

caused:

❯ hnc test.huff -r

thread 'main' (48566897) panicked at crates/utils/src/file/span.rs:55:44:
byte index 65 is not a char boundary; it is inside '→' (bytes 63..66) of `// a → b
#define macro MAIN() = takes (0) returns (1) {
// a → b
x123
}`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

fix(lexer): handle UTF-8 characters correctly in comments

dc3f543

cakevm merged commit 37f8ea8 into main Jan 3, 2026
12 checks passed

cakevm deleted the utf8 branch January 3, 2026 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(lexer): handle UTF-8 characters correctly in comments #159

fix(lexer): handle UTF-8 characters correctly in comments #159

Uh oh!

cakevm commented Jan 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(lexer): handle UTF-8 characters correctly in comments #159

fix(lexer): handle UTF-8 characters correctly in comments #159

Uh oh!

Conversation

cakevm commented Jan 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants