-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test effect of locale environment variables on Oil #522
Comments
(moved this comment) to #523 |
Right now all the operations that use libc respect LANG
However the ones we code ourselves don't:
Should we respect LANG in those cases? @asokoloski might have some thoughts. Basically I think OSH should be consistent across all string operations. One way to do that would be:
Another way would be:
Actually the second option is a subset of the first, so it might be OK to start with that. And if people complain, we can allow I'm not sure of all the details, but these are my thoughts. Feedback is appreciated! If I have misunderstood how libc works, let me know :) Another problem: are there systems where you can't set the locale to utf-8? And utf-8 only semantics won't work? That would argue in favor of also allowing |
Related to #523 . But even if we write our own glob() and fnmatch(), we're going to depend on libc for understanding |
And also add a test to see what happens when LANG=C. OSH fails because it shouldn't respect unicode with _?_. Addresses issue #522.
Hm I added a test case and am confused by the result. This new test case explicitly exports I would have thought the
But it does the substitution of
|
Yeah I undid the patch by commenting out Not saying there was anything wrong with the patch -- I still don't have a 100% clear idea of how Oil should behave. As mentioned I want it to be consistent between the operations that use libc and the ones that don't. I'm just confused because |
Move the setlocale() restoration after the regexec() call just in case. It doesn't appear to change the behavior. Related to issue #522.
@andychu I see the same thing. But the problem, as far as I can tell, is not with setlocale, but with the way we're handling environment vars. Printing out |
@andychu ah, so osh/state.py has its own The most straightforward way to make this work, that I can think of right now, is to explicitly pass mem, or just the But I'm sure you can tell me if there's a better way :) |
Other related stuff -- it seems like you can't set locales that use an encoding like UTF-16 or UTF-32, because linux requires character encodings to be null-safe. So the only character encodings I've seen are ones like UTF-8, C, or ISO-8859-. I've been playing around a bit with the interesting cases where filenames are in a different encoding than the current one. Here's what happens in bash when you create two files containing the word "needle" in swedish ("nål"), one with utf-8 encoding, and the other with iso-8859-1 encoding:
Globbing works fine if you escape the weird character, as that escape is probably interpreted after decoding the pattern, during the compilation step, but when you pass an invalid utf-8 string directly, it seems like bash gives up and doesn't treat it as a glob pattern anymore. I'll try to look into this a bit more soon. |
Ah I see! Thanks for debugging it. The issue is that in shell, "set these environment variables on subsequent process invocations" It doesn't actually call call So how does bash work? Here is a little hack I found. It has a special hook. I think we might need this in some other cases too. Let me file another bug about it. From
|
Hm I filed #527 ... but on the other hand, compatibility aside, I wonder if it even makes sense to have a special hook for LANG? Maybe this is a good opportunity to have something like To disable it, you could use blocks:
This is better than a special global variable with a hidden side effect IMO ... Again I don't like the global LANG because it's inherently kind of ignorant of a program dealing with data in multiple encodings. You can easily have a single file system where one dir has filenames in |
sh_spec.py
framework hard codes LANG, but we should test with a variety of settings.fnmatch()
,glob()
, andregcomp()
workAlso:
https://oilshell.zulipchat.com/#narrow/stream/121539-oil-dev
The text was updated successfully, but these errors were encountered: