Skip to content

Rework TextEdit arrow navigation to handle Unicode graphemes #5812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 22, 2025

Conversation

MStarha
Copy link
Contributor

@MStarha MStarha commented Mar 16, 2025

  • I have followed the instructions in the PR template

Previously, navigating text in TextEdit with Ctrl + left/right arrow would jump inside words that contained combining characters (i.e. diacritics). This PR introduces new dependency of unicode-segmentation to handle grapheme encoding. The new implementation ignores whitespace and other separators such as - (dash) between words, but respects _ (underscore).

Copy link

Preview available at https://egui-pr-preview.github.io/pr/5812-unicode-grapheme-navigation
Note that it might take a couple seconds for the update to show up after the preview_build workflow has completed.

@emilk
Copy link
Owner

emilk commented Mar 20, 2025

I did a quick check, and this increases the .wasm size by ~50 kB, which I think is acceptable (it's because of the tables here: https://github.com/unicode-rs/unicode-segmentation/blob/master/src/tables.rs)

Copy link
Owner

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Than you for working on this!

Does this fully close #62 ?

Please add a some unit tests of this feature so that we know it works, and that it won't break again 🙏

@MStarha
Copy link
Contributor Author

MStarha commented Mar 20, 2025

I think it does indeed solve #62, I just did not find it (I searched 'unicode' or 'utf', the term 'grapheme' did not occur to me). I ceratainly does nothing for #2432, I doubt it has much effect on #246, and is only a part of #56.

@MStarha
Copy link
Contributor Author

MStarha commented Mar 20, 2025

I just reworked the word splitting because I found out it complete fell apart around emojis.

Then I saw the is_word_char() function and got an idea: use the previous implementation, but instead of char::is_ascii_alphanumeric() use `char::is_alphanumeric(). Which behaves around 'normal words' the same way as the new implementation, but slightly different around emojis. Emojis are a completely different category, whose handling is not thouroughly consisten across editors and browsers, so I would not stress much about them.

The new unicode implementation may be useful, if used at a larger scale in the future (not just for word splitting in text edit). But currently the local-only effect of the dependency may not be worth what it brings compared to allowing non-ASCII characters in the existing implementation.

@valadaptive
Copy link
Contributor

This may end up covering the same ground as #5784.

@MStarha
Copy link
Contributor Author

MStarha commented Mar 21, 2025

That's true, though I think this can be finalized and merged and later replaced by #5784.

@emilk
Copy link
Owner

emilk commented Apr 1, 2025

@valadaptive do you think merging this PR will help or hinder your parley work?

@valadaptive
Copy link
Contributor

I think I'm going to need to redo it from scratch anyway, so go ahead and merge this.

@lucasmerlin lucasmerlin self-assigned this Apr 22, 2025
# Conflicts:
#	crates/egui/src/text_selection/text_cursor_state.rs
#	crates/egui/src/widgets/text_edit/text_buffer.rs
@lucasmerlin lucasmerlin added feature New feature or request egui labels Apr 22, 2025
@lucasmerlin lucasmerlin merged commit 69b9f0e into emilk:master Apr 22, 2025
47 of 48 checks passed
@lucasmerlin lucasmerlin removed their assignment Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
egui feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants