Add configurable word boundary characters for text selection #9335

mauroporras · 2025-10-24T20:38:18Z

Summary

This PR adds a new selection-word-chars configuration option that allows users to customize which characters mark word boundaries during text selection operations (double-click, word selection, etc.).

Motivation

This's been on my wishlist for a while. Inspired by #9069 which added semicolon as a hardcoded word boundary, this PR takes the concept further by making word boundaries fully configurable. Different workflows and use cases benefit from different boundary characters - SQL developers might want semicolons as boundaries, while others working with file paths or URLs might prefer different settings.

This approach is similar to zsh's WORDCHARS environment variable, giving users fine-grained control over text selection behavior.

Changes

New config option: selection-word-chars with default value ` \t'"│`|:;,()[]{}<>$`
Runtime UTF-8 parsing: Boundary characters are parsed from UTF-8 string to u32 codepoints
Updated function signatures: selectWord() and selectWordBetween() now accept boundary characters as parameters
All call sites updated: Surface.zig, embedded.zig, and all test cases updated

Usage

Users can now customize word boundaries in their config:

# Remove semicolon from boundaries (treat as part of words)
selection-word-chars = " \t'\"│`|:,()[]{}<>$"

# Remove periods for better URL selection
selection-word-chars = " \t'\"│`|:;,()[]{}<>$"

Implementation Details

Boundary characters are stored in DerivedConfig and passed through to selection functions
UTF-8 parsing happens at runtime with graceful fallback for invalid input
Null character (U+0000) is always included as a boundary automatically
Multi-byte UTF-8 characters are fully supported

AI Assistance Disclosure

With gratitude for the team and respect for the Contributing Guidelines, I want to disclose that this PR was written with AI assistance (Claude Code). I have reviewed all the code, and to the extent of my understanding, I'm prepared to answer any questions about the changes.

Add new `selection-word-chars` config option to customize which characters mark word boundaries during text selection operations (double-click, word selection, etc.). Similar to zsh's WORDCHARS environment variable, but specifies boundary characters rather than word characters. Default boundaries: ` \t'"│`|:;,()[]{}<>$` Users can now customize word selection behavior, such as treating semicolons as part of words or excluding periods from boundaries: selection-word-chars = " \t'\"│`|:,()[]{}<>$" Changes: - Add selection-word-chars config field with comprehensive documentation - Modify selectWord() and selectWordBetween() to accept boundary_chars parameter - Parse UTF-8 boundary string to u32 codepoints at runtime - Update all call sites in Surface.zig and embedded.zig - Update all test cases to pass boundary characters

mitchellh

I'm conceptually fine with this but I would use a slightly different approach, as noted in the comment.

mitchellh · 2025-10-26T03:13:43Z

src/config/Config.zig

+///     selection-word-chars = " \t'\"│`|:,()[]{}<>$"
+///
+/// Available since: 1.2.0
+@"selection-word-chars": []const u8 = " \t'\"│`|:;,()[]{}<>$",


Instead of making this a []const u8, I'd recommend making a new type here that automatically expands these into a list of codepoints.

This way we don't need an arbitrary max, we can limit it by the allocator (or put a really high limit), and we can allocate, in general!

It also limits the runtime cost when we actually do selection since the boundary characters are already built up.

@mitchellh, I made the change, thanks for your feedback.

Also, I ran zig build run and did a quick test with "some-hyphenated-words" and it worked.

Refactor the selection-word-chars implementation to parse UTF-8 boundary characters once during config initialization instead of on every selection operation. Changes: - Add SelectionWordChars type that stores pre-parsed []const u32 codepoints - Parse UTF-8 to codepoints in parseCLI() during config load - Remove UTF-8 parsing logic from selectWord() hot path (27 lines removed) - Remove arbitrary 64-character buffer limit - Update selectWord() and selectWordBetween() to accept []const u32 - Update DerivedConfig to store codepoints directly - Update all tests to use codepoint arrays Benefits: - No runtime UTF-8 parsing overhead on every selection - No arbitrary character limit (uses allocator instead) - Cleaner separation of concerns (config handles parsing, selection uses data) - Better performance in selection hot path

pluiedev

There's a lot of CI errors — have you tried running the tests yourself first? You have to run zig fmt to clean up the code, too.

pluiedev · 2025-10-26T11:46:40Z

src/config/Config.zig

+        const value = input orelse return error.ValueRequired;
+
+        // Parse UTF-8 string into codepoints
+        var list = std.ArrayList(u32).init(alloc);


In Zig 0.15 collection types are unmanaged by default - you need to pass the allocator into every use of the list that may (de-)allocate memory

Suggested change

var list = std.ArrayList(u32).init(alloc);

var list: std.ArrayList(u32) = .empty;

pluiedev · 2025-10-26T11:47:33Z

src/config/Config.zig

+    };
+
+    /// The parsed codepoints. Always includes null (U+0000) at index 0.
+    codepoints: []const u32 = &default_codepoints,


Unicode codepoints are expressed as u21s in the Zig standard library, so we should do the same here and avoid the @intCast below

mauroporras · 2025-10-26T16:00:58Z

There's a lot of CI errors — have you tried running the tests yourself first? You have to run zig fmt to clean up the code, too.

@mitchellh, sorry about this. I was using Zig 0.15.1 since 0.15.2 is not available in Homebrew just yet, so I built 0.15.2 from source.
~~I converted the PR to draft while I address your comments. Thanks.~~
Done, thanks.

@intcast

- Change all codepoint types from u32 to u21 to align with Zig stdlib - Update ArrayList to use Zig 0.15 unmanaged pattern (.empty) - Remove unnecessary @intcast when encoding UTF-8 - Fix formatEntry to use stack-allocated buffer

mauroporras requested review from a team as code owners October 24, 2025 20:38

mauroporras marked this pull request as draft October 24, 2025 20:39

mauroporras marked this pull request as ready for review October 24, 2025 20:47

mitchellh requested changes Oct 26, 2025

View reviewed changes

mauroporras added 2 commits October 26, 2025 06:23

refactor: clean up selection-word-chars documentation and formatting

c864643

pluiedev requested changes Oct 26, 2025

View reviewed changes

mauroporras marked this pull request as draft October 26, 2025 15:57

mauroporras marked this pull request as ready for review October 26, 2025 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add configurable word boundary characters for text selection #9335

Add configurable word boundary characters for text selection #9335

mauroporras commented Oct 24, 2025 •

edited

Loading

Uh oh!

mitchellh left a comment

Uh oh!

mitchellh Oct 26, 2025

Uh oh!

mauroporras Oct 26, 2025

Uh oh!

pluiedev left a comment •

edited

Loading

Uh oh!

pluiedev Oct 26, 2025

Uh oh!

pluiedev Oct 26, 2025

Uh oh!

mauroporras commented Oct 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	var list = std.ArrayList(u32).init(alloc);
	var list: std.ArrayList(u32) = .empty;

Add configurable word boundary characters for text selection #9335

Are you sure you want to change the base?

Add configurable word boundary characters for text selection #9335

Conversation

mauroporras commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Usage

Implementation Details

AI Assistance Disclosure

Related

Uh oh!

mitchellh left a comment

Choose a reason for hiding this comment

Uh oh!

mitchellh Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

mauroporras Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

pluiedev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pluiedev Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

pluiedev Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

mauroporras commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mauroporras commented Oct 24, 2025 •

edited

Loading

pluiedev left a comment •

edited

Loading

mauroporras commented Oct 26, 2025 •

edited

Loading