Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enforce import/embed case sensitivity #23163

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

marler8997
Copy link
Contributor

closes #9786

@squeek502
Copy link
Collaborator

Prior art: #11655

Comment on lines 1969 to 2010
if (!std.ascii.eqlIgnoreCase(import_component.name, real_path_component.name)) {
std.debug.panic("real path '{s}' does not end with import path '{s}'", .{ real_path, import_string });
}
Copy link
Collaborator

@squeek502 squeek502 Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will incorrectly fail for non-ASCII paths that have different casing (i.e. кириллица vs КИРИЛЛИЦА). You can use std.os.windows.eqlIgnoreCaseWtf8 instead if you want to keep the check (it can technically be used on any system, but pulls in extra uppercasing data on non-Windows systems), but not sure if that'd be foolproof either since that case-insensitivity is Windows-specific.

Copy link
Contributor Author

@marler8997 marler8997 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've added an eqlIgnoreCase function that will forward to std.os.windows.eqlIgnoreCaseWtf8 for windows, otherwise, it uses a utf8 iterator to compare paths by codepoint. It only considers ascii as far as upper/lowercase equivalent....I didn't find a non-windows specific way check two codepoints for equivalence ignoring case outside of using ascii.

I'm curious what happens on linux if you mount a case insensitive filesystem like vfat/NTFS....does it use the same upper/lower case equivalence as windows or is it different?

Copy link
Collaborator

@squeek502 squeek502 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'd suggest avoiding this entire comparison. As mentioned, it'd have to assume things about the details of the case-insensitivity to avoid false positives, which doesn't seem like a good idea.

Copy link
Contributor Author

@marler8997 marler8997 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. However, this check helps ensure our case sensitivity check doesn't have "false negatives". If there's a problem with what we're doing, this check will make sure the compiler crashes rather than have a false negative check (compiler doesn't detect the case mismatch and think everything is OK). I'd prefer the compiler to crash in this case so we can fix it rather than allow the bug to do nothing.

With this check in place we ensure that eventually the implementation will become robust, without it it's much less likely.

Copy link
Collaborator

@squeek502 squeek502 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this check will also panic when using @import/@embedFile on a symlink, e.g. this currently successfully imports link.zig which is a symlink to target.zig:

$ cat test.zig
comptime {
    _ = @import("link.zig");
}
`
$ ls -la
lrwxrwxrwx 1 ryan ryan   10 Mar 11 16:37 link.zig -> target.zig
-rw-rw-r-- 1 ryan ryan   19 Mar 11 16:38 target.zig
-rw-rw-r-- 1 ryan ryan  127 Mar 11 16:38 test.zig

$ cat target.zig
comptime {
    @compileLog("foo");
}

$ zig build-obj test.zig
link.zig:2:5: error: found compile log statement
    @compileLog("foo");
    ^~~~~~~~~~~~~~~~~~

But calling realpath on link.zig will return the path to target.zig, which will trip this panic.

(I'm unable to find a relevant issue about the intended behavior with regards to @import and following symlinks, so I'm not sure if the current behavior is intentional)

(note also that this even works if target.zig is outside the main package path, e.g. if link.zig is a symlink to ../target.zig; unsure if that's intended, too, but that would be a separate issue)

Copy link
Collaborator

@squeek502 squeek502 Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an idea for a potential alternate implementation that should avoid symlink-related complications:

  • Call openDir(dirname(cur_sub_path) orelse ".", .{ .iterate = true })
  • Iterate the directory, checking each Entry.name for strict equality to the basename of cur_sub_path
    • If nothing matched, then this is a case mismatch error for the current basename component (but note that it technically also could be a TOCTOU error if a file with the canonical casing did exist but was deleted while this check was occuring)
  • Chop off the last component from the cur_sub_path and continue the loop
  • Break after opening . and checking the leftmost component of the cur_sub_path

(could also go left-to-right through the path instead)

I'm sure this could be optimized for certain systems, e.g. on Windows NtQueryDirectoryFile could probably be used to avoid the directory iteration.

(I'll be able to properly explore this idea sometime after the next ~2 weeks FWIW)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...for some reason I had it in my head that zig's realpath function didn't actually resolve symlinks but I see this is false. I'll think on this and your suggestion to come up with a solution.

Copy link
Collaborator

@squeek502 squeek502 Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to open a fd to sub_path without following symlinks and then call std.os.getFdPath on it, but:

  • Getting a fd of a symlink is not supported on every platform (there's nothing in std.fs that supports it, but here's a test that does it for a few platforms)
  • std.os.getFdPath is not supported on every platform (and cannot be on some, e.g. WASI) and Andrew doesn't particularly like that function, see this comment

Copy link
Contributor Author

@marler8997 marler8997 Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love any of the solutions so far. Here's some more thoughts I've had to share in the meantime.

On windows this is easy to implement because it has APIs to get the original case of a file path without resolving symlinks, I'm guessing macos might have similar APIs we could leverage. Linux can have case insensitive filesystems, but, there don't seem to be any APIs available that allow you to retrieve the original case of a file path, other than an exhaustive directory listing/compare and who knows what upper/lowercase scheme it's using. It feels a bit like we're fighting against how the system is intended to be used.

Your idea to verify each part of an import subpath by listing each directory and ensuring there's an exact match should work but seems expensive. We could make it even more complicated by doing a check that verifies whether that directory is case sensitive first and if not we can skip the directory listing. This could be done by opening the file/directory, which we would do anyway, then, swapping one of the characters in that basename for their upper/lowercase counterpart, open that file/directory again and compare their inode number. If it's a match, which I expect would be rare, then we continue with the directory listing.

To reiterate not sure if any of this is worth it but leaving the comment for future reference. I think I might play around with a windows-only implementation (maybe macOS too) and see how that looks. I expect just supporting those 2 platforms would cover over 99% of real world cases and this might be one of those times where the tradeoff for that extra 1% isn't worth it.

@marler8997 marler8997 force-pushed the caseSensitiveImports branch from d82af61 to 61f383e Compare March 11, 2025 00:23
@marler8997 marler8997 force-pushed the caseSensitiveImports branch from 61f383e to d94b4bb Compare March 11, 2025 00:59
Comment on lines +1957 to +1969
var a_utf8_it = std.unicode.Utf8View.initUnchecked(a).iterator();
var b_utf8_it = std.unicode.Utf8View.initUnchecked(b).iterator();
while (true) {
const a_cp = a_utf8_it.nextCodepoint() orelse break;
const b_cp = b_utf8_it.nextCodepoint() orelse return false;
if (a_cp != b_cp) {
const a_upper = std.ascii.toUpper(std.math.cast(u8, a_cp) orelse return false);
const b_upper = std.ascii.toUpper(std.math.cast(u8, b_cp) orelse return false);
if (a_upper != b_upper)
return false;
}
}
if (b_utf8_it.nextCodepoint() != null) return false;
Copy link
Collaborator

@squeek502 squeek502 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this codepoint iteration loop has no practical difference to std.ascii.eqlIgnoreCase (besides doing a lot of extra work), since UTF-8 multibyte codepoints will never contain bytes within the ASCII range. In other words, its always safe to iterate over bytes with UTF-8 if you only care about the ASCII range.

(but see #23163 (comment))

@marler8997 marler8997 marked this pull request as draft March 12, 2025 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

@import is too lenient with paths on windows
2 participants