Skip to content

Solve problem with panic on some non-ASCII strings in cargo install #10994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

timcryt
Copy link

@timcryt timcryt commented Aug 16, 2022

This make cargo don't panic on some non-ASCII strings when creating dep_path.

Close #10993

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ehuss (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 16, 2022
@timcryt timcryt changed the title Unicode dep path Solve problem with panic on some non-ASCII strings in cargo install Aug 16, 2022
Copy link
Contributor

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to fix this!

@@ -42,4 +51,13 @@ mod tests {
assert_eq!(make_dep_path("AbCd", false), "Ab/Cd/AbCd");
assert_eq!(make_dep_path("aBcDe", false), "aB/cD/aBcDe");
}

#[test]
fn test_10993() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you call out what this is testing either in place of or in addition to the issue?

Comment on lines 14 to 27
match dep_name.chars().take(4).count() {
1 => format!("1{}{}", slash, name),
2 => format!("2{}{}", slash, name),
3 => format!("3/{}{}{}", &dep_name[..1], slash, name),
_ => format!("{}/{}{}{}", &dep_name[0..2], &dep_name[2..4], slash, name),
3 => {
let first_symbol = dep_name.chars().take(1).collect::<String>();

format!("3/{}{}{}", first_symbol, slash, name)
}
_ => {
let first_symbol = dep_name.chars().take(2).collect::<String>();
let second_symbol = dep_name.chars().skip(2).take(2).collect::<String>();

format!("{}/{}{}{}", first_symbol, second_symbol, slash, name)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you find where the other half of this lives (where the directories are written to) so we make sure we match these up exactly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used to get a path of folder of searched crate in crates-io index, so there is no the other half. cargo is only tries to read these directories.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates-io index

This is the other half.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for now, non-ASCII crate name is not supported in both Cargo and crates.io1. I am not sure whether any 3rd party registry works fine with that.

Footnotes

  1. https://github.com/rust-lang/crates.io/blob/2e744af8573f90b9706fa72ccffe5a52f91e2fdc/src/models/krate.rs#L267-L270

Comment on lines 17 to 20
3 => {
let first_symbol = dep_name.chars().take(1).collect::<String>();

format!("3/{}{}{}", first_symbol, slash, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we remove this allocation?

Suggested change
3 => {
let first_symbol = dep_name.chars().take(1).collect::<String>();
format!("3/{}{}{}", first_symbol, slash, name)
3 => format!("3/{}{}{}", dep_name.chars().next().unwrap(), slash, name),

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain whether such avoiding merits appearance the unwrap operation, considering that this piece of code isn't performance important.

Comment on lines 22 to 27
_ => {
let first_symbol = dep_name.chars().take(2).collect::<String>();
let second_symbol = dep_name.chars().skip(2).take(2).collect::<String>();

format!("{}/{}{}{}", first_symbol, second_symbol, slash, name)
}
Copy link
Member

@weihanglo weihanglo Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my premature optimization, could we avoid allocations once again? This function will be called several times in a build. Also, we should assert empty string as well since we don't want to create a ill-formatted path.

Suggested change
_ => {
let first_symbol = dep_name.chars().take(2).collect::<String>();
let second_symbol = dep_name.chars().skip(2).take(2).collect::<String>();
format!("{}/{}{}{}", first_symbol, second_symbol, slash, name)
}
l => {
assert!(l > 0, "length of a crate name must not be zero");
let mut it = dep_name.chars();
let (f0, f1) = (it.next().unwrap(), it.next().unwrap());
let (s0, s1) = (it.next().unwrap(), it.next().unwrap());
format!("{f0}{f1}/{s0}{s1}{slash}{name}")
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am totally fine with you not accepting this ugly optimization, though I'd prefer to keep an assertion here (though it's unlikely to happen).

3 => format!("3/{}{}{}", &dep_name[..1], slash, name),
_ => format!("{}/{}{}{}", &dep_name[0..2], &dep_name[2..4], slash, name),

let mut chars = [None; 4];
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to optimize the code in that way it shouldn't allocate nor call unwrap, but it seems not so clear as previous one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Actually the original one (the panicking one) is most readable IMO. And your first attempt also got a better looking than mine and the slice pattern one.

Just made a micro benchmark against top 10 downloads crates on crates.io. The performance difference is statistically significant. The real world difference roughly equals 50ns times number of crates in the dependency tree. Say 200 crates, and that is like 10ms difference in a cargo build.

I wouldn't say it's significant for a build. There are so many other places can optimise than this tiny function. I am happy with every solution here if this PR eventually gets approved and merged :)


Criterion benchmark report
Benchmark code

The result above was tested on:

cargo 1.65.0-nightly (efd4ca3dc 2022-08-12)
release: 1.65.0-nightly
commit-hash: efd4ca3dc0b89929dc8c5f5c023d25978d76cb61
commit-date: 2022-08-12
host: aarch64-apple-darwin
libgit2: 1.4.2 (sys:0.14.2 vendored)
libcurl: 7.79.1 (sys:0.4.55+curl-7.83.1 system ssl:(SecureTransport) LibreSSL/3.3.6)
os: Mac OS 12.5.0 [64-bit]

@weihanglo weihanglo added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 27, 2022
@weihanglo
Copy link
Member

I'm going to close due to inactivity. We are still interested in improving the error message here. Before going deeper, one needs to figure out how crates.io handles that (mentioned here) to determine if we needs to support non-ASCII or provide a better message when no-ASCII characters appear.

Feel free to reopen if you have a chance to work on it again. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cargo install panics on some non-ASCII input
5 participants