Skip to content

Conversation

@cakevm
Copy link
Owner

@cakevm cakevm commented Jan 3, 2026

  • Fix lexer not handling UTF-8 characters correctly in comments.
    • Multi-byte UTF-8 characters (e.g., ) in comments caused incorrect span positions for subsequent tokens.
    • The lexer now properly tracks byte positions instead of character indices.

Example:

// a → b
#define macro MAIN() = takes (0) returns (1) {
// a → b
x123
}

caused:

❯ hnc test.huff -r

thread 'main' (48566897) panicked at crates/utils/src/file/span.rs:55:44:
byte index 65 is not a char boundary; it is inside '→' (bytes 63..66) of `// a → b
#define macro MAIN() = takes (0) returns (1) {
// a → b
x123
}`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@cakevm cakevm merged commit 37f8ea8 into main Jan 3, 2026
12 checks passed
@cakevm cakevm deleted the utf8 branch January 3, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants