-
Notifications
You must be signed in to change notification settings - Fork 255
Add FixedLengthVarULE and user in icu_plurals #7394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
robertbastian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the problem this solves.
It can also be used to create a string or zerovec on the stack, without any additional dependencies. |
Manishearth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand the motivation: why is VarZeroCow<'static> insufficient? What is the benefit of having these values on the stack instead of static memory?
As far as I can tell, anything you ought to be able to do with this type in const should be doable in non-const as well.
| /// use icu::locale::locale; | ||
| /// use zerovec::ule::FixedLengthVarULE; | ||
| /// | ||
| /// let value = "hello, world!"; // 13 bytes long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: this isn't a motivating example: this doesn't work in const anyway (since there's try_from_encodeable)
(It's an example, but not a motivating one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would work in const if I replaced try_from_encodeable with new_unchecked, I just wanted to not have unsafe code in the docs test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fine, I was just saying that the example itself wasn't motivating the new zerovec API.
But I have a clearer idea of what's going on now.
|
Ah, I understand. This lets you copy the data into a new, stack-owned buffer, which you can't do with static data. Hm |
components/plurals/src/provider.rs
Outdated
| /// ``` | ||
| /// | ||
| /// [generic_const_exprs]: https://doc.rust-lang.org/beta/unstable-book/language-features/generic-const-exprs.html#generic_const_exprs | ||
| pub const fn new_singleton_mn<const M: usize, const N: usize>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like these APIs where you have to guess the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand your comment, but this is in an unstable module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have to know M and N for this not to panic
Manishearth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to hear more about the use case for new_singleton_mn in a const context, since as far as I can tell that would need unsafe code anyway.
I think I see use cases for the new type: being able to create a self-contained stack VarULE is useful.
I think the type name is somewhat misleading. Perhaps ConstStackVarULE? or ConstOwnedVarULE?
I think the type is relatively niche and we should name it in a way that reflects that, and document it as such. Stuffing it in the ule module sounds good, otherwise. In the long term I'd like to organize the mess that is that module.
components/plurals/src/provider.rs
Outdated
| remainder[i] = input.as_bytes()[i]; | ||
| i += 1; | ||
| } | ||
| *start = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: mention "first byte = 0 for a singleton" for redundancy
components/plurals/src/provider.rs
Outdated
| /// use zerovec::ule::FixedLengthVarULE; | ||
| /// | ||
| /// const plural_ule: FixedLengthVarULE<1, PluralElementsPackedULE<str>> = | ||
| /// PluralElementsPackedULE::new_singleton_mn::<0, 1>(FixedLengthVarULE::EMPTY_STR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clever
|
I added a const example with no unsafe. In general this is useful for making default/empty VarULEs in const contexts that are composed of inner ULE types. |
Right, but where are we hoping to do that? I'm still missing parts of the story here I think. |
I want to use this in call sites such as pub const PLURAL_PATTERN_0: &'static PluralElementsPackedULE<SinglePlaceholderPattern> =
unsafe { PluralElementsPackedULE::from_bytes_unchecked(&[0, 1]) };in order to remove the unsafe. @robertbastian may or may not agree with that particular call site but it serves as a useful illustration. |
|
Hmm. I guess this gets more useful if we have consts on various VarULE types for common patterns. I do worry that this is a fair amount of unsafe code to get rid of some easily-tested unsafe, though. Are we planning on using this const pattern a lot? An alternative potential route might be to define a macro that can define VarULE consts from unchecked bytes, where it also generates a test. This macro becomes unsafe to use, BUT its safety invariant is mostly "if the test passes this macro usage is fine". Not saying that's better, but making sure we have considered the design space. |
|
I honestly think for that call site this would be a regression. The current code is unsafe, but it's a single expression, similar to hundreds that we have in baked data, and there's a test that it's correct. This PR adds a lot of weird APIs and unsafe code just to make that one expression safe. |
|
Overall I've found the defaults story in VarULE to be unsatisfying, so "more consts" is definitely an interesting proposition. BUT I'm not fully convinced we should do it this way. Unsure. |
|
Note: the use case for this type of helper will only grow, because the composition stuff (VarTuple) is fairly new and we are using it more and more. |
It's not just about that one expression. It's about that class of expressions, which I claim is a class that will get bigger over time. |
|
I also just want to point out: pub const PLURAL_PATTERN_0: &'static PluralElementsPackedULE<SinglePlaceholderPattern> =
unsafe { PluralElementsPackedULE::from_bytes_unchecked(&[0, 1]) };This code requires subject matter expertise in 3 different components: pattern (to know that 1 means "single placeholder pattern with a single placeholder"), plurals (to know that 0 means "singleton plural elements"), and compact (to know that this combination of pattern and plurals is suitable for pub const PLURAL_PATTERN_0: /* some type */ =
PluralElementsPackedULE::new_singleton(SinglePlaceholderPattern::PASS_THROUGH_ULE);(That line of code does two things that we don't have yet: |
robertbastian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't agree with this, but leaving a technical review
components/plurals/src/provider.rs
Outdated
| /// ``` | ||
| /// | ||
| /// [generic_const_exprs]: https://doc.rust-lang.org/beta/unstable-book/language-features/generic-const-exprs.html#generic_const_exprs | ||
| pub const fn new_singleton_mn<const M: usize, const N: usize>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"singleton" is the wrong term, we don't use that with plurals. this method is equivalent to PluralElements::new, so it should just be new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "singleton" is already used in this file multiple times, including in test function names.
The difference between singleton and non-singleton matters a great deal to PluralElementsPackedULE, because it results in two different encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the caller shouldn't have to know encoding details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also doesn't need to be qualified as _mn; it's an unstable type so we can just change the generics if generic_const_exprs ever gets stabilised (I wouldn't hold my breath). so just new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep the _mn because it's a pain to debug cross-crate function naming issues when the function mostly stays the same but const parameters change a little. But I will remove singleton.
| input: ConstStackVarULE<M, V>, | ||
| ) -> ConstStackVarULE<N, PluralElementsPackedULE<V>> { | ||
| #[allow(clippy::panic)] // for safety, and documented | ||
| if N != M + 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: this should be a const assertion, not a runtime panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please suggest code that compiles with a const assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can't do it then we shouldn't have this yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm? We have lots of const code with runtime assertions because there isn't a good way to write const assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even the Rust standard library does runtime panics on const parameters. For example:
https://doc.rust-lang.org/std/primitive.slice.html#method.as_chunks
slice::as_chunks panics if the const parameter N is zero. There was a lengthy discussion on the proposal thread, starting here: rust-lang/rust#74985 (comment)
components/plurals/src/provider.rs
Outdated
| remainder[i] = input.as_bytes()[i]; | ||
| i += 1; | ||
| } | ||
| // First byte = 0 for a singleton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not "for a singleton" but "for a other case"
and this 0 represents a PluralCategoryAndMetadata, so you should use that type to construct it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for a singleton. That is the term from the byte encoding definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually to be fully correct we should add in the FourBitMetadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes a PackedPluralElementsULE<T> corresponds to a PluralElements<(FourBitMetadata, T)>, so the constructor should take that tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. (not as a tuple)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all other APIs use (FourBitMetadata, T), please at least change the order
| /// ConstStackVarULE::<20, str>::try_from_encodeable("hello, world!").unwrap_err(); | ||
| /// ``` | ||
| #[derive(Copy, Clone, PartialEq, Eq)] | ||
| pub struct ConstStackVarULE<const N: usize, V: VarULE + ?Sized> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: this is not generally on the stack. the point is that it's Sized so it can be on the stack. SizedVarULE sounds like a trait, maybe SizedVarULEBytes? EncodedSizedVarULE? EncodedVarULEBytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like SizedVarULEBytes
| /// | ||
| /// Returns an error if the byte length in the container is not the correct length | ||
| /// for the encodeable object. | ||
| pub fn try_from_encodeable(input: impl EncodeAsVarULE<V>) -> Result<Self, UleError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub fn try_from_encodeable(input: impl EncodeAsVarULE<V>) -> Result<Self, UleError> { | |
| pub fn try_from_encodeable(input: &impl EncodeAsVarULE<V>) -> Result<Self, UleError> { |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the recommend style here. It doesn't matter if the trait is implemented on a reference. @Manishearth ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is style choice. you only test this with a &str input, but say you have a String, you don't want to accidentally pass that by value.
No it doesn't. I wrote that code and don't know how these things are encoded. I just wrote the test and then put the bytes there that make the test pass. I think that's a valid and very readable approach.
Yes! I would love that. But it's not feasible with current Rust/zerovec. |
components/plurals/src/provider.rs
Outdated
| /// let plural_ule = PluralElementsPackedULE::new_singleton_mn::<13, 14>(inner_ule, metadata); | ||
| /// let rules = PluralRules::try_new(locale!("en").into(), Default::default()).unwrap(); | ||
| /// | ||
| /// assert_eq!(plural_ule.as_varule().get(0.into(), &rules).1, "hello, world!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: merge the assertions
| /// assert_eq!(plural_ule.as_varule().get(0.into(), &rules).1, "hello, world!"); | |
| /// assert_eq!(plural_ule.as_varule().get(0.into(), &rules), ("hello, world!", metadata)); |
components/plurals/src/provider.rs
Outdated
| /// const plural_ule: SizedVarULEBytes<1, PluralElementsPackedULE<str>> = | ||
| /// PluralElementsPackedULE::new_singleton_mn::<0, 1>(SizedVarULEBytes::EMPTY_STR, FourBitMetadata::zero()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or better just integrate into the example above, this is very repetitive
| /// const plural_ule: SizedVarULEBytes<1, PluralElementsPackedULE<str>> = | |
| /// PluralElementsPackedULE::new_singleton_mn::<0, 1>(SizedVarULEBytes::EMPTY_STR, FourBitMetadata::zero()); | |
| /// let plural_ule = const { | |
| /// PluralElementsPackedULE::new_singleton_mn::<0, 1>(SizedVarULEBytes::EMPTY_STR, FourBitMetadata::zero()) }; |
components/plurals/src/provider.rs
Outdated
| /// ``` | ||
| /// | ||
| /// [generic_const_exprs]: https://doc.rust-lang.org/beta/unstable-book/language-features/generic-const-exprs.html#generic_const_exprs | ||
| pub const fn new_singleton_mn<const M: usize, const N: usize>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also doesn't need to be qualified as _mn; it's an unstable type so we can just change the generics if generic_const_exprs ever gets stabilised (I wouldn't hold my breath). so just new
I am not yet convinced that this is the case, but the use case seems sufficiently well outlined and I agree it is valid.. I would be in favor of us having more consts like this. As I said before, the default construction problem in varule is something I've wanted to see improvement on. Regarding the MN stuff I'm overall okay with landing suboptimal API we know that we can clean up in future Rust. Though I don't know if we have a timeline for const expressions in generics? |
Well, it works in nightly, and rust-lang/rust#76560 has a lot of upvotes. It seems like there are concrete next steps but no one is currently actively funding it. |
|
OK I added an ergonomic pub const PLURAL_PATTERN_0: SizedVarULEBytes<
2,
PluralElementsPackedULE<SinglePlaceholderPattern>,
> = PluralElementsPackedULE::new_mn(
FourBitMetadata::zero(),
to_sized_varule_bytes!(SinglePlaceholderPattern::PASS_THROUGH),
);Const variables always require a type, or else the type could be elided, except for the number 2. |
|
@sffc Thanks! I think if we're open to using a macro we don't need the SizedVarULEBytes type at all. The macro can construct a const/static array, and then construct a second const/static that references it. This is what I was referring to earlier:
I can write more about this later. |
|
If we can't remove the unsafe, then a macro of that shape isn't worth doing in my opinion. |
|
Sorry, I should have been more explicit: I think using the design of your macro so far combined with my model, this can be done without callsite unsafe. |
See #7391
Depends on #7399
I was making a PR that got a bit bigger than I wanted, so I split this out into a standalone change.
We finally have
VarZeroCow, a heap-or-reference container for a VarULE. Another reasonable container to want is what I'm proposing in this PR, aFixedLengthVarULE, which stores the VarULE on the stack with a fixed number of bytes.This can be used to compose ULEs and VarULEs in const contexts, as I show with the PluralElementsPackedULE example.