-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Solve problem with panic on some non-ASCII strings in cargo install
#10994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ehuss (or someone else) soon. Please see the contribution instructions for more information. |
cargo install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to fix this!
@@ -42,4 +51,13 @@ mod tests { | |||
assert_eq!(make_dep_path("AbCd", false), "Ab/Cd/AbCd"); | |||
assert_eq!(make_dep_path("aBcDe", false), "aB/cD/aBcDe"); | |||
} | |||
|
|||
#[test] | |||
fn test_10993() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you call out what this is testing either in place of or in addition to the issue?
crates/cargo-util/src/registry.rs
Outdated
match dep_name.chars().take(4).count() { | ||
1 => format!("1{}{}", slash, name), | ||
2 => format!("2{}{}", slash, name), | ||
3 => format!("3/{}{}{}", &dep_name[..1], slash, name), | ||
_ => format!("{}/{}{}{}", &dep_name[0..2], &dep_name[2..4], slash, name), | ||
3 => { | ||
let first_symbol = dep_name.chars().take(1).collect::<String>(); | ||
|
||
format!("3/{}{}{}", first_symbol, slash, name) | ||
} | ||
_ => { | ||
let first_symbol = dep_name.chars().take(2).collect::<String>(); | ||
let second_symbol = dep_name.chars().skip(2).take(2).collect::<String>(); | ||
|
||
format!("{}/{}{}{}", first_symbol, second_symbol, slash, name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you find where the other half of this lives (where the directories are written to) so we make sure we match these up exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used to get a path of folder of searched crate in crates-io index, so there is no the other half. cargo is only tries to read these directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crates-io index
This is the other half.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for now, non-ASCII crate name is not supported in both Cargo and crates.io1. I am not sure whether any 3rd party registry works fine with that.
Footnotes
crates/cargo-util/src/registry.rs
Outdated
3 => { | ||
let first_symbol = dep_name.chars().take(1).collect::<String>(); | ||
|
||
format!("3/{}{}{}", first_symbol, slash, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we remove this allocation?
3 => { | |
let first_symbol = dep_name.chars().take(1).collect::<String>(); | |
format!("3/{}{}{}", first_symbol, slash, name) | |
3 => format!("3/{}{}{}", dep_name.chars().next().unwrap(), slash, name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain whether such avoiding merits appearance the unwrap operation, considering that this piece of code isn't performance important.
crates/cargo-util/src/registry.rs
Outdated
_ => { | ||
let first_symbol = dep_name.chars().take(2).collect::<String>(); | ||
let second_symbol = dep_name.chars().skip(2).take(2).collect::<String>(); | ||
|
||
format!("{}/{}{}{}", first_symbol, second_symbol, slash, name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my premature optimization, could we avoid allocations once again? This function will be called several times in a build. Also, we should assert empty string as well since we don't want to create a ill-formatted path.
_ => { | |
let first_symbol = dep_name.chars().take(2).collect::<String>(); | |
let second_symbol = dep_name.chars().skip(2).take(2).collect::<String>(); | |
format!("{}/{}{}{}", first_symbol, second_symbol, slash, name) | |
} | |
l => { | |
assert!(l > 0, "length of a crate name must not be zero"); | |
let mut it = dep_name.chars(); | |
let (f0, f1) = (it.next().unwrap(), it.next().unwrap()); | |
let (s0, s1) = (it.next().unwrap(), it.next().unwrap()); | |
format!("{f0}{f1}/{s0}{s1}{slash}{name}") | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am totally fine with you not accepting this ugly optimization, though I'd prefer to keep an assertion here (though it's unlikely to happen).
3 => format!("3/{}{}{}", &dep_name[..1], slash, name), | ||
_ => format!("{}/{}{}{}", &dep_name[0..2], &dep_name[2..4], slash, name), | ||
|
||
let mut chars = [None; 4]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to optimize the code in that way it shouldn't allocate nor call unwrap, but it seems not so clear as previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Actually the original one (the panicking one) is most readable IMO. And your first attempt also got a better looking than mine and the slice pattern one.
Just made a micro benchmark against top 10 downloads crates on crates.io. The performance difference is statistically significant. The real world difference roughly equals 50ns times number of crates in the dependency tree. Say 200 crates, and that is like 10ms difference in a cargo build.
I wouldn't say it's significant for a build. There are so many other places can optimise than this tiny function. I am happy with every solution here if this PR eventually gets approved and merged :)
Criterion benchmark report
Benchmark code
The result above was tested on:
cargo 1.65.0-nightly (efd4ca3dc 2022-08-12)
release: 1.65.0-nightly
commit-hash: efd4ca3dc0b89929dc8c5f5c023d25978d76cb61
commit-date: 2022-08-12
host: aarch64-apple-darwin
libgit2: 1.4.2 (sys:0.14.2 vendored)
libcurl: 7.79.1 (sys:0.4.55+curl-7.83.1 system ssl:(SecureTransport) LibreSSL/3.3.6)
os: Mac OS 12.5.0 [64-bit]
I'm going to close due to inactivity. We are still interested in improving the error message here. Before going deeper, one needs to figure out how crates.io handles that (mentioned here) to determine if we needs to support non-ASCII or provide a better message when no-ASCII characters appear. Feel free to reopen if you have a chance to work on it again. Thank you! |
This make cargo don't panic on some non-ASCII strings when creating
dep_path
.Close #10993