Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Functionality for int_format_into for integer types #138338

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

madhav-madhusoodanan
Copy link
Contributor

@madhav-madhusoodanan madhav-madhusoodanan commented Mar 11, 2025

Context

  1. Implemented integer formatting into a fixed-size buffer without relying on fmt.
  2. Added a NumBuffer type to help with this process.

Associated Issue

r? @hanna-kruppe

@rustbot
Copy link
Collaborator

rustbot commented Mar 11, 2025

Failed to set assignee to hanna-kruppe: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 11, 2025
@madhav-madhusoodanan
Copy link
Contributor Author

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 11, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@hanna-kruppe
Copy link
Contributor

FYI, I'm going to unsubscribe from notifications here for now. As the bot said, I'm not a reviewer on this repository and you should pick someone else once the PR is ready for review. Although I can't approve any PR, I wouldn't mind also giving some input on this one, so feel free to @ me again at that point. But right now I'm getting a firehose of uninformative emails that makes it likely I'll miss the point where it would be useful for me to take a look.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@madhav-madhusoodanan
Copy link
Contributor Author

Apologies @hanna-kruppe for the notifications from this PR. I was finally able to get a PASS on the doctests I created for this functionality.

/// An alternative to `contents.len()` is `BUF_SIZE`.
pub contents: [MaybeUninit<u8>; BUF_SIZE],
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of questions:

Should we move NumBuffer to another module (such as array) so that it could be useful for other modules too in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this type remains a very thin wrapper around a [MaybeUninit<u8>; CONST_GENERIC_PARAM] then I don't see the point of having the type at all -- just use the array. If it's sized to fit decimal representations of integers, as originally planned, then it's specific to numbers and seems out of place in other modules.

offset
}

#[inline]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decomposed the process into 2-3 parts (so that I can reuse functionality for both signed and unsigned integers):

  1. Write raw digits into the buffer
  2. Write the negative sign into the buffer (only for signed types, incase number is negative)
  3. Generate the &str from the buffer

Is this decomposition a good idea? My concern is about readability and the safety assumptions of the unsafe blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: removed #[inline] tags and added comments for clarity.

@madhav-madhusoodanan madhav-madhusoodanan changed the title [WIP] Functionality for int_format_into for integer types Functionality for int_format_into for integer types Mar 12, 2025
@madhav-madhusoodanan
Copy link
Contributor Author

r? @scottmcm
@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 12, 2025
Copy link
Contributor

@hanna-kruppe hanna-kruppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first look. I'm surprised by the changes in API design (user-chosen buffer size, panic if that's not sufficient) so I didn't go into detail with the unsafe code or really most of the implementation in general.

/// `MaybeUninit<u8>`.
#[unstable(feature = "int_format_into", issue = "138215")]
#[derive(Debug)]
pub struct NumBuffer<const BUF_SIZE: usize> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tracking issue only mentions a non-generic buffer type (as suggested by T-libs-api members), with sufficient size for any integer. While I still have sympathies for making the buffer generic over the integer type to guard against much larger integer types in the future, it's not clear to me why users should be allowed to pick any (overly large or far too small) buffer size they like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I mistook NumBuffer<$Int> as generic over the size of the buffer, not the integer type.

Will fix this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you mention it, I can see how Buffer<$int> is kinda ambiguous, oops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first solution that comes to mind is to use the nightly feature generic_const_exprs to calculate the max buffer sizes for each integer type.

I don’t think this would be recommended, right?

The alternative I think might be to manually set the buffer size for each integer type using macros.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest first implementing the API as it was accepted in the ACP, i.e., with a single non-generic struct NumBuffer { ... }.

Comment on lines +180 to +192
let decimal_string_size: usize = if self < 0 {
self.unsigned_abs().ilog(10) as usize + 1 + 1
} else if self == 0 {
1
} else {
self.ilog(10) as usize + 1
};

// `buf` must have minimum size to store the decimal string version.
// BUF_SIZE is the size of the buffer.
if BUF_SIZE < decimal_string_size {
panic!("Not enough buffer size to format into!");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... for example, all of this is only necessary because BUF_SIZE is chosen by the user and not determined by the integer type. And since it's based on the actual value of self, a smaller value may work sometimes. Is this intentional? It's rather different from what the existing standard library code, the itoa crate, and the API change proposal do.

}

/// Macro to write ascii digits into the buffer, towards the end.
macro_rules! raw_write_digits_into {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the existing code in the Display impls? I would have expected that code to be reused, not effectively duplicated.

Comment on lines +57 to +62
fn raw_extract_as_str<const BUF_SIZE: usize>(buf: &NumBuffer<BUF_SIZE>, offset: usize) -> &str {
// SAFETY: All contents of `buf` since `offset` is set.
let written = unsafe { buf.contents.get_unchecked(offset..) };

// SAFETY: Writes use ASCII from the lookup table
// (and `NEGATIVE_SIGN` in case of negative numbers) exclusively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be unsafe since it does not itself ensure the properties the safety comments rely on, it relies on the caller to do so.

/// An alternative to `contents.len()` is `BUF_SIZE`.
pub contents: [MaybeUninit<u8>; BUF_SIZE],
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this type remains a very thin wrapper around a [MaybeUninit<u8>; CONST_GENERIC_PARAM] then I don't see the point of having the type at all -- just use the array. If it's sized to fit decimal representations of integers, as originally planned, then it's specific to numbers and seems out of place in other modules.

#[doc = concat!("assert_eq!(n3.format_into(&mut buf3), ", stringify!($SignedT::MAX), ".to_string());")]
/// ```
///
pub fn format_into<const BUF_SIZE: usize>(self, buf: &mut crate::num::NumBuffer<BUF_SIZE>) -> &str {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, one more thing -- it would be very unfortunate to monomorphize all of this code for every distinct choice of buffer size (multiplied further by duplicate instantiations across separate codegen units). There's really not much benefit to be had by specializing the code for a particular buffer size (the only one I can think of is that a sufficiently clever compiler could elide the "is the size sufficient?" check if it's large enough for any value of the integer), so it's just a waste of binary size. As far as I know, no existing library on crates.io makes this choice either, and they're all quite performance conscious.

@madhav-madhusoodanan madhav-madhusoodanan changed the title Functionality for int_format_into for integer types [WIP] Functionality for int_format_into for integer types Mar 13, 2025
@madhav-madhusoodanan madhav-madhusoodanan marked this pull request as draft March 13, 2025 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants