-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support LANG=C, LC_ALL=C, etc. (Problems with extended ASCII in strings) #529
Comments
Not sure exactly what we will do here yet. Although Oil implements length in code points because bash supports it, it's actually a fairly useless operation. Usually you want length in bytes, e.g. for a length-prefixed serialization like netstrings. I can't think of any use case for length in code points. |
As with the previous issue, my main concern is that OSH silently matches it successfully against any pattern. I don't seem to see that mentioned in the other issues. |
That may be because a truncated utf-8 character is treated as empty by Also note that bash's behavior is totally incoherent when there's invalid utf-8: https://github.com/oilshell/oil/blob/master/spec/var-op-len.test.sh#L58 When given a monotonically increasing number of bytes, it produces this sequence of length in chars, which goes up and down!
|
Why would something empty be successfully matched against anything, though ( |
I guess it's fine not to be compatible with the length, then. I'd probably rather have the error than this. The matching actually works on Haiku, so it seems that it is a libc issue. At least you seem to already be considering rewriting the function in #523, so supporting this might not take much extra work? |
@Crestwave I just copied your test case in to the spec tests, and the underlying cause is that Oil assumes everything is UTF-8 now. It doesn't support LANG=C or LC_ALL=C, etc. But after a conversation with @asokoloski on Zulip, I intend for Oil to support those features. This works depends on #527 , which isn't too bad. So when we add libc env variable support, then this bug along with a few others should be fixed. Thanks for the report! It took awhile to figure out what we want, but the concrete test cases help. (BTW the underlying issue LC_ALL=C is handled specially in bash. Just setting that variable doesn't make libc see it. A shell has to do a |
Note: the |
Hit this with the regex But Oil doesn't respect that yet I would also like some alias like this:
Test files affected:
|
CPython DOES call it, so there's a workaround in the dev build only: libc.cpython_reset_locale(). - Remove explicit setlocale() calls from native/libc.c - Remove setlocale() calls when OVM_MAIN is defined - Add the cpython_reset_locale() hack for the remaining case - Refactor OVM_MAIN check into into pyutil.IsAppBundle() Spec tests: - spec/glob: Test started passing! Woohoo we're more consistent. - spec/oil-regex: Documented that we need LANG=C support. This is issue #529. Addresses issue #912.
The text was updated successfully, but these errors were encountered: