-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add configurable word boundary characters for text selection #9335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add configurable word boundary characters for text selection #9335
Conversation
Add new `selection-word-chars` config option to customize which characters
mark word boundaries during text selection operations (double-click, word
selection, etc.). Similar to zsh's WORDCHARS environment variable, but
specifies boundary characters rather than word characters.
Default boundaries: ` \t'"│`|:;,()[]{}<>$`
Users can now customize word selection behavior, such as treating
semicolons as part of words or excluding periods from boundaries:
selection-word-chars = " \t'\"│`|:,()[]{}<>$"
Changes:
- Add selection-word-chars config field with comprehensive documentation
- Modify selectWord() and selectWordBetween() to accept boundary_chars parameter
- Parse UTF-8 boundary string to u32 codepoints at runtime
- Update all call sites in Surface.zig and embedded.zig
- Update all test cases to pass boundary characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm conceptually fine with this but I would use a slightly different approach, as noted in the comment.
src/config/Config.zig
Outdated
| /// selection-word-chars = " \t'\"│`|:,()[]{}<>$" | ||
| /// | ||
| /// Available since: 1.2.0 | ||
| @"selection-word-chars": []const u8 = " \t'\"│`|:;,()[]{}<>$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making this a []const u8, I'd recommend making a new type here that automatically expands these into a list of codepoints.
This way we don't need an arbitrary max, we can limit it by the allocator (or put a really high limit), and we can allocate, in general!
It also limits the runtime cost when we actually do selection since the boundary characters are already built up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mitchellh, I made the change, thanks for your feedback.
Also, I ran zig build run and did a quick test with "some-hyphenated-words" and it worked.
Refactor the selection-word-chars implementation to parse UTF-8 boundary characters once during config initialization instead of on every selection operation. Changes: - Add SelectionWordChars type that stores pre-parsed []const u32 codepoints - Parse UTF-8 to codepoints in parseCLI() during config load - Remove UTF-8 parsing logic from selectWord() hot path (27 lines removed) - Remove arbitrary 64-character buffer limit - Update selectWord() and selectWordBetween() to accept []const u32 - Update DerivedConfig to store codepoints directly - Update all tests to use codepoint arrays Benefits: - No runtime UTF-8 parsing overhead on every selection - No arbitrary character limit (uses allocator instead) - Cleaner separation of concerns (config handles parsing, selection uses data) - Better performance in selection hot path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of CI errors — have you tried running the tests yourself first? You have to run zig fmt to clean up the code, too.
src/config/Config.zig
Outdated
| const value = input orelse return error.ValueRequired; | ||
|
|
||
| // Parse UTF-8 string into codepoints | ||
| var list = std.ArrayList(u32).init(alloc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Zig 0.15 collection types are unmanaged by default - you need to pass the allocator into every use of the list that may (de-)allocate memory
| var list = std.ArrayList(u32).init(alloc); | |
| var list: std.ArrayList(u32) = .empty; |
src/config/Config.zig
Outdated
| }; | ||
|
|
||
| /// The parsed codepoints. Always includes null (U+0000) at index 0. | ||
| codepoints: []const u32 = &default_codepoints, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unicode codepoints are expressed as u21s in the Zig standard library, so we should do the same here and avoid the @intCast below
@mitchellh, sorry about this. I was using Zig 0.15.1 since 0.15.2 is not available in Homebrew just yet, so I built 0.15.2 from source. |
- Change all codepoint types from u32 to u21 to align with Zig stdlib - Update ArrayList to use Zig 0.15 unmanaged pattern (.empty) - Remove unnecessary @intcast when encoding UTF-8 - Fix formatEntry to use stack-allocated buffer
Summary
This PR adds a new
selection-word-charsconfiguration option that allows users to customize which characters mark word boundaries during text selection operations (double-click, word selection, etc.).Motivation
This's been on my wishlist for a while. Inspired by #9069 which added semicolon as a hardcoded word boundary, this PR takes the concept further by making word boundaries fully configurable. Different workflows and use cases benefit from different boundary characters - SQL developers might want semicolons as boundaries, while others working with file paths or URLs might prefer different settings.
This approach is similar to zsh's
WORDCHARSenvironment variable, giving users fine-grained control over text selection behavior.Changes
selection-word-charswith default value` \t'"│`|:;,()[]{}<>$`selectWord()andselectWordBetween()now accept boundary characters as parametersUsage
Users can now customize word boundaries in their config:
Implementation Details
DerivedConfigand passed through to selection functionsAI Assistance Disclosure
With gratitude for the team and respect for the Contributing Guidelines, I want to disclose that this PR was written with AI assistance (Claude Code). I have reviewed all the code, and to the extent of my understanding, I'm prepared to answer any questions about the changes.
Related