-
Notifications
You must be signed in to change notification settings - Fork 255
Double ended iterator #7419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Double ended iterator #7419
Conversation
sffc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work; we've been needing this for a long time. We need to be super careful that this produces correct results.
| } | ||
|
|
||
| impl<Y: RuleBreakType> DoubleEndedIterator for RuleBreakIterator<'_, '_, Y> { | ||
| fn next_back(&mut self) -> Option<Self::Item> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to consider more than 1 code point back. See .next()
| /// The iterator over characters. | ||
| type IterAttr<'s>: Iterator<Item = (usize, Self::CharType)> + Clone + core::fmt::Debug; | ||
| type IterAttr<'s>: Iterator<Item = (usize, Self::CharType)> | ||
| + DoubleEndedIterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more test coverage; .next() and .next_back() should produce identical breakpoints. Just take a bunch of real test data (which we should already have) and run it both ways and check that the vectors are the same (just reversed).
|
thanks for the review, started working on it |
|
Hi @sffc, What I added :-
icu4x> cargo test --test spec_test
running 12 tests
test run_grapheme_break_extra_test ... ok
test run_sentence_break_extra_test ... FAILED
test run_sentence_break_test ... FAILED
test run_word_break_extra_test ... ok
test run_grapheme_break_test ... ok
test run_line_break_extra_test ... ok
test run_word_break_test ... ok
test run_sentence_break_random_test ... FAILED
test run_line_break_test ... ok
test run_line_break_random_test ... ok
test run_word_break_random_test ... FAILED
test run_grapheme_break_random_test ... ok
failures:
run_sentence_break_extra_test
run_sentence_break_random_test
run_sentence_break_test
run_word_break_random_test
test result: FAILED. 8 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2.81sIts very complex to mimic the logic used in the forward iteration, can you help me out reviewing my current code and give me some insight. The |
Addresses #6996
Summary :-
Adding another field right_boundary_property to iterate from the end, and trying to implement
DoubleEndedIteratorforUtf16IndicesandLatin1Indicesright_boundary_propertytographeme.rs,word.rsandsentence.rs.next_back(), allowing wrapper segmenters to delegate reverse iteration initerator_helper.rs.next_back()function forUtf16Indicesto allow construction of Unicode code points from the end of the string backwards while correctly merging surrogate pairs. (inindices.rs)next_back()function forLatin1Indiceswhich yields byte-index and<u8>character pairs starting from the end. (inindices.rs)DoubleEndedIteratorforRuleBreakIteratorimplementation, it enables finding segmentation boundaries in reverse, iterating from the end of the text using rule-based logic.