-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Functionality for int_format_into
for integer types
#138338
base: master
Are you sure you want to change the base?
[WIP] Functionality for int_format_into
for integer types
#138338
Conversation
Failed to set assignee to
|
@rustbot author |
This comment has been minimized.
This comment has been minimized.
d247480
to
dcb7e5d
Compare
This comment has been minimized.
This comment has been minimized.
dcb7e5d
to
cdd0823
Compare
This comment has been minimized.
This comment has been minimized.
cdd0823
to
a8968ea
Compare
This comment has been minimized.
This comment has been minimized.
a8968ea
to
e1d3241
Compare
This comment has been minimized.
This comment has been minimized.
e1d3241
to
ab8dd05
Compare
This comment has been minimized.
This comment has been minimized.
ab8dd05
to
890aaf1
Compare
This comment has been minimized.
This comment has been minimized.
890aaf1
to
8a6b99b
Compare
This comment has been minimized.
This comment has been minimized.
8a6b99b
to
0b924f5
Compare
This comment has been minimized.
This comment has been minimized.
0b924f5
to
4d0187e
Compare
This comment has been minimized.
This comment has been minimized.
4d0187e
to
4a67fb1
Compare
This comment has been minimized.
This comment has been minimized.
4a67fb1
to
2c1eba8
Compare
This comment has been minimized.
This comment has been minimized.
2c1eba8
to
d5834e9
Compare
c0dbe54
to
6c6dc4b
Compare
This comment has been minimized.
This comment has been minimized.
6c6dc4b
to
8abc14d
Compare
This comment has been minimized.
This comment has been minimized.
8abc14d
to
766cc48
Compare
This comment has been minimized.
This comment has been minimized.
766cc48
to
eaf91ab
Compare
This comment has been minimized.
This comment has been minimized.
FYI, I'm going to unsubscribe from notifications here for now. As the bot said, I'm not a reviewer on this repository and you should pick someone else once the PR is ready for review. Although I can't approve any PR, I wouldn't mind also giving some input on this one, so feel free to @ me again at that point. But right now I'm getting a firehose of uninformative emails that makes it likely I'll miss the point where it would be useful for me to take a look. |
eaf91ab
to
5d58984
Compare
This comment has been minimized.
This comment has been minimized.
5d58984
to
e572a1f
Compare
This comment has been minimized.
This comment has been minimized.
e572a1f
to
d4b4e3f
Compare
This comment has been minimized.
This comment has been minimized.
d4b4e3f
to
6f619d9
Compare
Apologies @hanna-kruppe for the notifications from this PR. I was finally able to get a PASS on the doctests I created for this functionality. |
/// An alternative to `contents.len()` is `BUF_SIZE`. | ||
pub contents: [MaybeUninit<u8>; BUF_SIZE], | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of questions:
Should we move NumBuffer
to another module (such as array
) so that it could be useful for other modules too in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this type remains a very thin wrapper around a [MaybeUninit<u8>; CONST_GENERIC_PARAM]
then I don't see the point of having the type at all -- just use the array. If it's sized to fit decimal representations of integers, as originally planned, then it's specific to numbers and seems out of place in other modules.
library/core/src/num/int_format.rs
Outdated
offset | ||
} | ||
|
||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decomposed the process into 2-3 parts (so that I can reuse functionality for both signed and unsigned integers):
- Write raw digits into the buffer
- Write the negative sign into the buffer (only for signed types, incase number is negative)
- Generate the
&str
from the buffer
Is this decomposition a good idea? My concern is about readability and the safety assumptions of the unsafe
blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: removed #[inline]
tags and added comments for clarity.
int_format_into
for integer typesint_format_into
for integer types
6f619d9
to
8860091
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first look. I'm surprised by the changes in API design (user-chosen buffer size, panic if that's not sufficient) so I didn't go into detail with the unsafe code or really most of the implementation in general.
/// `MaybeUninit<u8>`. | ||
#[unstable(feature = "int_format_into", issue = "138215")] | ||
#[derive(Debug)] | ||
pub struct NumBuffer<const BUF_SIZE: usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tracking issue only mentions a non-generic buffer type (as suggested by T-libs-api members), with sufficient size for any integer. While I still have sympathies for making the buffer generic over the integer type to guard against much larger integer types in the future, it's not clear to me why users should be allowed to pick any (overly large or far too small) buffer size they like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I mistook NumBuffer<$Int>
as generic over the size of the buffer, not the integer type.
Will fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you mention it, I can see how Buffer<$int>
is kinda ambiguous, oops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first solution that comes to mind is to use the nightly feature generic_const_exprs
to calculate the max buffer sizes for each integer type.
I don’t think this would be recommended, right?
The alternative I think might be to manually set the buffer size for each integer type using macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest first implementing the API as it was accepted in the ACP, i.e., with a single non-generic struct NumBuffer { ... }
.
let decimal_string_size: usize = if self < 0 { | ||
self.unsigned_abs().ilog(10) as usize + 1 + 1 | ||
} else if self == 0 { | ||
1 | ||
} else { | ||
self.ilog(10) as usize + 1 | ||
}; | ||
|
||
// `buf` must have minimum size to store the decimal string version. | ||
// BUF_SIZE is the size of the buffer. | ||
if BUF_SIZE < decimal_string_size { | ||
panic!("Not enough buffer size to format into!"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... for example, all of this is only necessary because BUF_SIZE
is chosen by the user and not determined by the integer type. And since it's based on the actual value of self
, a smaller value may work sometimes. Is this intentional? It's rather different from what the existing standard library code, the itoa
crate, and the API change proposal do.
} | ||
|
||
/// Macro to write ascii digits into the buffer, towards the end. | ||
macro_rules! raw_write_digits_into { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from the existing code in the Display
impls? I would have expected that code to be reused, not effectively duplicated.
fn raw_extract_as_str<const BUF_SIZE: usize>(buf: &NumBuffer<BUF_SIZE>, offset: usize) -> &str { | ||
// SAFETY: All contents of `buf` since `offset` is set. | ||
let written = unsafe { buf.contents.get_unchecked(offset..) }; | ||
|
||
// SAFETY: Writes use ASCII from the lookup table | ||
// (and `NEGATIVE_SIGN` in case of negative numbers) exclusively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be unsafe
since it does not itself ensure the properties the safety comments rely on, it relies on the caller to do so.
/// An alternative to `contents.len()` is `BUF_SIZE`. | ||
pub contents: [MaybeUninit<u8>; BUF_SIZE], | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this type remains a very thin wrapper around a [MaybeUninit<u8>; CONST_GENERIC_PARAM]
then I don't see the point of having the type at all -- just use the array. If it's sized to fit decimal representations of integers, as originally planned, then it's specific to numbers and seems out of place in other modules.
#[doc = concat!("assert_eq!(n3.format_into(&mut buf3), ", stringify!($SignedT::MAX), ".to_string());")] | ||
/// ``` | ||
/// | ||
pub fn format_into<const BUF_SIZE: usize>(self, buf: &mut crate::num::NumBuffer<BUF_SIZE>) -> &str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, one more thing -- it would be very unfortunate to monomorphize all of this code for every distinct choice of buffer size (multiplied further by duplicate instantiations across separate codegen units). There's really not much benefit to be had by specializing the code for a particular buffer size (the only one I can think of is that a sufficiently clever compiler could elide the "is the size sufficient?" check if it's large enough for any value of the integer), so it's just a waste of binary size. As far as I know, no existing library on crates.io makes this choice either, and they're all quite performance conscious.
int_format_into
for integer typesint_format_into
for integer types
Context
fmt
.NumBuffer
type to help with this process.Associated Issue
r? @hanna-kruppe