Skip to content

Handle win32 separator for cygwin paths #141864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Berrysoft
Copy link
Contributor

@Berrysoft Berrysoft commented Jun 1, 2025

This PR handles a issue that cygwin actually supports Win32 path, so we need to handle the Win32 prefix and separaters.

r? @mati865

cc @jeremyd2019

Not sure if I should handle the prefix like the windows target... Cygwin does support win32 paths directly going through the APIs, but I think it's not the recommended way.

Here I just use cygwin_conv_path because it handles both cygwin and win32 paths correctly and convert them into absolute POSIX paths.

UPDATE: Windows path prefix is handled.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 1, 2025
@jieyouxu jieyouxu added the O-cygwin Target: *-pc-cygwin label Jun 1, 2025
@rust-log-analyzer

This comment has been minimized.

Copy link
Contributor

@jeremyd2019 jeremyd2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I just use cygwin_conv_path because it handles both cygwin and win32 paths correctly and convert them into absolute POSIX paths.

I was going to raise the issue of encoding, but looking at the code it doesn't seem to deal with WIN_A as a different encoding than POSIX paths, so I guess that's not an issue. I guess I need to have a conversation on the Cygwin list to better understand this.

@Berrysoft
Copy link
Contributor Author

Finally I reuse the prefix parser from Windows. Now the cygwin Path can handle Win32 path correctly. It should behave a little different as it allows / as verbatim separater. Here is a simple test code:

use std::path::{absolute, Path};

fn main() {
    test_path("C:\\msys64");
    test_path("/dev/sda1");
    test_path("./sda1");
    test_path("/c/Users");
    test_path("\\\\?\\UNC/localhost/share");
}

fn test_path(p: impl AsRef<Path>) {
    let p = p.as_ref();
    let abs = absolute(p).unwrap();
    if p.is_absolute() {
        println!("{} == {}", p.display(), abs.display());
    } else {
        println!("{} -> {}", p.display(), abs.display());
    }
    println!("{:?}", p.components().collect::<Vec<_>>());
}

On Windows, it should output (the current drive is D:)

C:\msys64 == C:\msys64
[Prefix(PrefixComponent { raw: "C:", parsed: Disk(67) }), RootDir, Normal("msys64")]
/dev/sda1 -> D:\dev\sda1
[RootDir, Normal("dev"), Normal("sda1")]
./sda1 -> <<<current dir>>>\sda1
[CurDir, Normal("sda1")]
/c/Users -> D:\c\Users
[RootDir, Normal("c"), Normal("Users")]
\\?\UNC/localhost/share == \\?\UNC/localhost/share
[Prefix(PrefixComponent { raw: "\\\\?\\UNC/localhost/share", parsed: VerbatimUNC("localhost/share", "") })]

While on Cygwin, it should output

C:\msys64 == /c/msys64
[Prefix(PrefixComponent { raw: "C:", parsed: Disk(67) }), RootDir, Normal("msys64")]
/dev/sda1 == /dev/sda1
[RootDir, Normal("dev"), Normal("sda1")]
./sda1 -> <<<current dir>>>/sda1
[CurDir, Normal("sda1")]
/c/Users == /c/Users
[RootDir, Normal("c"), Normal("Users")]
\\?\UNC/localhost/share == //?/UNC/localhost/share
[Prefix(PrefixComponent { raw: "\\\\?\\UNC/localhost/share", parsed: VerbatimUNC("localhost", "share") })]

@Berrysoft
Copy link
Contributor Author

@jeremyd2019

I was going to raise the issue of encoding, but looking at the code it doesn't seem to deal with WIN_A as a different encoding than POSIX paths, so I guess that's not an issue. I guess I need to have a conversation on the Cygwin list to better understand this.

Actually rust doesn't handle very well for non-UTF-8 encoding on Unix. OsStr and Path are all UTF-8 (because they could be constructed from a UTF-8 str without copy) and they are passed to OS APIs directly without encoding conversion. The only concerns, I think, should be command line input and console input. Do you know that whether Cygwin handles the encoding conversion of argv and stdin? I hope it does:)

@Berrysoft
Copy link
Contributor Author

OK, seems that the command line is converted from UTF-16: https://github.com/cygwin/cygwin/blob/972872c0d396dbf0ce296b8280cee08ce7727b51/winsup/cygwin/dcrt0.cc#L902

And stdin handles the encoding conversion: https://github.com/cygwin/cygwin/blob/972872c0d396dbf0ce296b8280cee08ce7727b51/winsup/cygwin/fhandler/pty.cc#L3935

So I think all inputs are UTF-8 for a Cygwin program, and the developers could not construct a non-UTF-8 Win32 path if they aren't aware of encodings. If they are, they should know that Rust codes are UTF-8:)

@Berrysoft
Copy link
Contributor Author

Now I'm a little worried about cygwin_conv_path then. If the maintainers someday find that it should handle code page encodings, the code here will be messed up:(

@pietroalbini
Copy link
Member

This was originally reported by @Ry0taK through Rust's security disclosure process, and since this affects a still-in-development tier 3 target (with very few users right now) we agreed to let the development of the fix happen in the open.

@mati865
Copy link
Member

mati865 commented Jun 2, 2025

This change is T-libs area, so it will require their review.

r? rust-lang/libs

I'm not familiar with Cygwin API at all, but I see what you want to achieve here, and I think it's sensible.

Now I'm a little worried about cygwin_conv_path then. If the maintainers someday find that it should handle code page encodings, the code here will be messed up:(

They wouldn't want to break that API as a normal release, right? I'd expected a documented API to remain backwards compatible.

@pietroalbini I'm out of the loop, could you shed some light on your comment?

@rustbot rustbot assigned tgross35 and unassigned mati865 Jun 2, 2025
@mati865
Copy link
Member

mati865 commented Jun 2, 2025

Also, there is a typo in both the title and the first commit 😉

@Berrysoft Berrysoft changed the title Handle win32 separater for cygwin paths Handle win32 separator for cygwin paths Jun 2, 2025
@jeremyd2019
Copy link
Contributor

So I think all inputs are UTF-8 for a Cygwin program, and the developers could not construct a non-UTF-8 Win32 path if they aren't aware of encodings. If they are, they should know that Rust codes are UTF-8:)

Cygwin's encoding may not always be UTF-8 - one can set the locale environment variables like they can on Unix to set the encoding, and then the conversions of paths and terminal input will use that encoding instead of UTF-8.

@Berrysoft
Copy link
Contributor Author

In such locale, Rust doesn't even work on Linux. That's fair:)

@pietroalbini
Copy link
Member

pietroalbini commented Jun 3, 2025

@pietroalbini I'm out of the loop, could you shed some light on your comment?

@mati865 this was originally reported to [email protected] as it could defeat path traversal mitigations like this:

if p.components().into_iter().any(|x| x == Component::ParentDir) {
    panic!("path traversal");
}

By only parsing unix-like paths, it would be possible to sneak in a path traversal using win32 paths. Given the current status of the target we agreed it would be ok to develop the fix in the open.

@Berrysoft
Copy link
Contributor Author

@tgross35 ping?

Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty reasonable to me, a few small comments/questions

@tgross35
Copy link
Contributor

@pietroalbini I'm out of the loop, could you shed some light on your comment?

@mati865 this was originally reported to [email protected] as it could defeat path traversal mitigations like this:

if p.components().into_iter().any(|x| x == Component::ParentDir) {
    panic!("path traversal");
}

By only parsing unix-like paths, it would be possible to sneak in a path traversal using win32 paths. Given the current status of the target we agreed it would be ok to develop the fix in the open.

@Berrysoft could you add a regression test to this effect that fails without this change? Probably in library/std/tests/path.rs.

@Berrysoft Berrysoft requested a review from tgross35 June 10, 2025 07:59
Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test request then lgtm with a squash

@Berrysoft Berrysoft force-pushed the cygwin-path branch 2 times, most recently from a15c379 to b6ad7ff Compare June 10, 2025 17:12
@jeremyd2019
Copy link
Contributor

Should there be some testing of // prefixes, like //?/C:/foo/bar and //server/share/foo.txt?

@Berrysoft
Copy link
Contributor Author

Should there be some testing of // prefixes, like //?/C:/foo/bar and //server/share/foo.txt?

I'll add some immediately.

@tgross35
Copy link
Contributor

@jeremyd2019 mind confirming/approving when this looks good to you? Considering you're one of the resident cygwin experts.

@Berrysoft
Copy link
Contributor Author

Added tests for // prefixes and mixed separater verbatim paths.

@jeremyd2019
Copy link
Contributor

It looks like Cygwin will take //. as well, but only //.\C: not //./C:. Go figure

@jeremyd2019
Copy link
Contributor

jeremyd2019 commented Jun 10, 2025

Are these tests actually run in CI? I was concerned that the prefix parsing had some hard-coded backslashes that may result in the forward-slash // prefixes not working as desired. It'd be good to get confirmation that these tests all pass before merging.

@Berrysoft
Copy link
Contributor Author

No. I ran the tests on my devices...

The prefix parser converts all / to \ in the beginning so it's OK to hardcode here.

@Berrysoft
Copy link
Contributor Author

It looks like Cygwin will take //. as well, but only //.\C: not //./C:. Go figure

Well... You're right then. So when does Cygwin doesn't take / in a path?

P.S. cygwin_conv_path converts //.\C:/msys64 to //./C:/msys64 and I think it might be a bug?

@jeremyd2019
Copy link
Contributor

It looks like Cygwin will take //. as well, but only //.\C: not //./C:. Go figure

Well... You're right then. So when does Cygwin doesn't take / in a path?

P.S. cygwin_conv_path converts //.\C:/msys64 to //./C:/msys64 and I think it might be a bug?

I don't know, probably have to ask Corinna when she's online

@jeremyd2019
Copy link
Contributor

@jeremyd2019 mind confirming/approving when this looks good to you? Considering you're one of the resident cygwin experts.

I think the tests cover the cases I thought of, with the exception of //. prefix which seems kind of questionable how that should work.

@Berrysoft
Copy link
Contributor Author

Seems that Cygwin treats //./ and //../ as //. Wierd but I think I can handle that...

@Berrysoft
Copy link
Contributor Author

I think I need more time to think how to handle the //././../././../ prefix as //... If there's no better way, I think I should remove the prefix parser for Cygwin.

@Berrysoft
Copy link
Contributor Author

Now they are handled. Based on my experiments, only //./ (together with //../) is special and should be dealt with. \\.\, \\./, //.\ are valid device paths. I have to add a new field to Components to record the extra length of ./ and ../.

@rust-log-analyzer

This comment has been minimized.

@jeremyd2019
Copy link
Contributor

jeremyd2019 commented Jun 11, 2025

My only concern is will this work out well if the current handling of //./ is determined to be a bug and is fixed? It may not be, // being a distinct root from / is apparently allowed by POSIX, and perhaps they require 'ordinary' handling of . and .. elements in that case (for current and parent directory) with the presence of \ used to allow handling of the Windows device namespace.

I posted this question to https://inbox.sourceware.org/cygwin/[email protected]/T/#u

@Berrysoft
Copy link
Contributor Author

I don't think it's a bug now. It's only a little wierd. Due to the design of rust std, I choose to identify the whole //././?/C: as a prefix.

@Berrysoft
Copy link
Contributor Author

Tested on Cygwin and I found that Cygwin even treats //./?/C:/msys64/../../../?/D:/msys64 equal to D:\msys64. So Cygwin really thinks that's a POSIX path. I think I should follow it. The path without driver letter or //, and not containing \, is a POSIX path even it contains /?/ and could be recognized finally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-cygwin Target: *-pc-cygwin S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants