-
Notifications
You must be signed in to change notification settings - Fork 398
Description
Environment
- Idris2 Version: 0.7.0-6e52f899b
- OS: Linux (Ubuntu)
- Locale: C.UTF-8 (UTF-8 enabled)
- Backend: Chez Scheme
Description
When compiling an Idris2 project located in a directory path containing non-ASCII Unicode characters (e.g., accented characters like é), Idris2 generates the compileChez script with incorrectly encoded path strings. Instead of properly escaping UTF-8 multi-byte sequences, Idris2 writes raw bytes that Chez Scheme cannot interpret correctly.
Steps to Reproduce
- Create a directory with a non-ASCII character in its name:
mkdir tpRéférence
cd tpRéférence-
Create a minimal Idris2 project with
main.ipkgfile declaring arunmainexecutable -
Compile the project:
idris2 --build main.ipkg
Expected Behavior
The compilation should succeed and generate a working executable. The generated compileChez script should contain properly escaped Unicode characters compatible with Chez Scheme's string syntax.
Actual Behavior
Compilation fails with:
Exception in compile-program: failed for /path/to/tpRfrence/build/exec/runmain_app/runmain.ss: no such file or directory
Error: INTERNAL ERROR: Chez exited with return code 255
Notice the tpRfrence instead of tpRéférence - the é characters have been corrupted.
Root Cause
Examining the generated build/exec/runmain_app/compileChez file reveals:
(parameterize ([optimize-level 3] [compile-file-message #f])
(compile-program "/home/user/tpR\233f\233rence/build/exec/runmain_app/runmain.ss"))The byte sequence for é in UTF-8 is 0xC3 0xA9, but Idris2 writes \233 (which is 0xE9, the ISO-8859-1 encoding). This appears to be writing the second byte of the UTF-8 sequence without the first byte, or incorrectly treating UTF-8 bytes as individual characters.
Impact
- Projects cannot be compiled if their path contains any non-ASCII characters
- This affects international users who may have accented characters in their usernames or project paths
- The error message is misleading as it shows a corrupted path rather than indicating an encoding issue
Workaround
Avoid using non-ASCII characters in project paths.