-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enforce import/embed case sensitivity #23163
base: master
Are you sure you want to change the base?
Conversation
Prior art: #11655 |
src/Zcu/PerThread.zig
Outdated
if (!std.ascii.eqlIgnoreCase(import_component.name, real_path_component.name)) { | ||
std.debug.panic("real path '{s}' does not end with import path '{s}'", .{ real_path, import_string }); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will incorrectly fail for non-ASCII paths that have different casing (i.e. кириллица
vs КИРИЛЛИЦА
). You can use std.os.windows.eqlIgnoreCaseWtf8
instead if you want to keep the check (it can technically be used on any system, but pulls in extra uppercasing data on non-Windows systems), but not sure if that'd be foolproof either since that case-insensitivity is Windows-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I've added an eqlIgnoreCase
function that will forward to std.os.windows.eqlIgnoreCaseWtf8
for windows, otherwise, it uses a utf8 iterator to compare paths by codepoint. It only considers ascii as far as upper/lowercase equivalent....I didn't find a non-windows specific way check two codepoints for equivalence ignoring case outside of using ascii.
I'm curious what happens on linux if you mount a case insensitive filesystem like vfat/NTFS....does it use the same upper/lower case equivalence as windows or is it different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'd suggest avoiding this entire comparison. As mentioned, it'd have to assume things about the details of the case-insensitivity to avoid false positives, which doesn't seem like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. However, this check helps ensure our case sensitivity check doesn't have "false negatives". If there's a problem with what we're doing, this check will make sure the compiler crashes rather than have a false negative check (compiler doesn't detect the case mismatch and think everything is OK). I'd prefer the compiler to crash in this case so we can fix it rather than allow the bug to do nothing.
With this check in place we ensure that eventually the implementation will become robust, without it it's much less likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this check will also panic when using @import
/@embedFile
on a symlink, e.g. this currently successfully imports link.zig
which is a symlink to target.zig
:
$ cat test.zig
comptime {
_ = @import("link.zig");
}
`
$ ls -la
lrwxrwxrwx 1 ryan ryan 10 Mar 11 16:37 link.zig -> target.zig
-rw-rw-r-- 1 ryan ryan 19 Mar 11 16:38 target.zig
-rw-rw-r-- 1 ryan ryan 127 Mar 11 16:38 test.zig
$ cat target.zig
comptime {
@compileLog("foo");
}
$ zig build-obj test.zig
link.zig:2:5: error: found compile log statement
@compileLog("foo");
^~~~~~~~~~~~~~~~~~
But calling realpath
on link.zig
will return the path to target.zig
, which will trip this panic.
(I'm unable to find a relevant issue about the intended behavior with regards to @import
and following symlinks, so I'm not sure if the current behavior is intentional)
(note also that this even works if target.zig
is outside the main package path, e.g. if link.zig
is a symlink to ../target.zig
; unsure if that's intended, too, but that would be a separate issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an idea for a potential alternate implementation that should avoid symlink-related complications:
- Call
openDir(dirname(cur_sub_path) orelse ".", .{ .iterate = true })
- Iterate the directory, checking each
Entry.name
for strict equality to thebasename
ofcur_sub_path
- If nothing matched, then this is a case mismatch error for the current
basename
component (but note that it technically also could be a TOCTOU error if a file with the canonical casing did exist but was deleted while this check was occuring)
- If nothing matched, then this is a case mismatch error for the current
- Chop off the last component from the cur_sub_path and continue the loop
- Break after opening
.
and checking the leftmost component of the cur_sub_path
(could also go left-to-right through the path instead)
I'm sure this could be optimized for certain systems, e.g. on Windows NtQueryDirectoryFile
could probably be used to avoid the directory iteration.
(I'll be able to properly explore this idea sometime after the next ~2 weeks FWIW)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah...for some reason I had it in my head that zig's realpath function didn't actually resolve symlinks but I see this is false. I'll think on this and your suggestion to come up with a solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to open a fd to sub_path
without following symlinks and then call std.os.getFdPath
on it, but:
- Getting a fd of a symlink is not supported on every platform (there's nothing in
std.fs
that supports it, but here's a test that does it for a few platforms) std.os.getFdPath
is not supported on every platform (and cannot be on some, e.g. WASI) and Andrew doesn't particularly like that function, see this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love any of the solutions so far. Here's some more thoughts I've had to share in the meantime.
On windows this is easy to implement because it has APIs to get the original case of a file path without resolving symlinks, I'm guessing macos might have similar APIs we could leverage. Linux can have case insensitive filesystems, but, there don't seem to be any APIs available that allow you to retrieve the original case of a file path, other than an exhaustive directory listing/compare and who knows what upper/lowercase scheme it's using. It feels a bit like we're fighting against how the system is intended to be used.
Your idea to verify each part of an import subpath by listing each directory and ensuring there's an exact match should work but seems expensive. We could make it even more complicated by doing a check that verifies whether that directory is case sensitive first and if not we can skip the directory listing. This could be done by opening the file/directory, which we would do anyway, then, swapping one of the characters in that basename for their upper/lowercase counterpart, open that file/directory again and compare their inode number. If it's a match, which I expect would be rare, then we continue with the directory listing.
To reiterate not sure if any of this is worth it but leaving the comment for future reference. I think I might play around with a windows-only implementation (maybe macOS too) and see how that looks. I expect just supporting those 2 platforms would cover over 99% of real world cases and this might be one of those times where the tradeoff for that extra 1% isn't worth it.
d82af61
to
61f383e
Compare
61f383e
to
d94b4bb
Compare
var a_utf8_it = std.unicode.Utf8View.initUnchecked(a).iterator(); | ||
var b_utf8_it = std.unicode.Utf8View.initUnchecked(b).iterator(); | ||
while (true) { | ||
const a_cp = a_utf8_it.nextCodepoint() orelse break; | ||
const b_cp = b_utf8_it.nextCodepoint() orelse return false; | ||
if (a_cp != b_cp) { | ||
const a_upper = std.ascii.toUpper(std.math.cast(u8, a_cp) orelse return false); | ||
const b_upper = std.ascii.toUpper(std.math.cast(u8, b_cp) orelse return false); | ||
if (a_upper != b_upper) | ||
return false; | ||
} | ||
} | ||
if (b_utf8_it.nextCodepoint() != null) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this codepoint iteration loop has no practical difference to std.ascii.eqlIgnoreCase
(besides doing a lot of extra work), since UTF-8 multibyte codepoints will never contain bytes within the ASCII range. In other words, its always safe to iterate over bytes with UTF-8 if you only care about the ASCII range.
(but see #23163 (comment))
closes #9786