Add character count `Intl.Segmenter` support with customisable count function by colinrotherham · Pull Request #6995 · alphagov/govuk-frontend

colinrotherham · 2026-04-28T17:28:45Z

This PR updates the character count component to (optionally) use Intl.Segmenter

It's a non-breaking change and requires a new countType option to be set:

countType: "length" (default) continues to count code points
countType: "characters" counts graphemes (user-perceived characters)
countType: "words" counts words regardless of punctuation

Closes #1104, #1364 and partly #2888

Test coverage

I've skipped on tests until you're happy with the proposal (and comments) in:

Character count's character/word count functions should be customisable #1364 (comment)

These changes are lifted from NHS.UK frontend in:

With some related configuration changes in:

[v10] Add support for functions in component configs nhsuk/nhsuk-frontend#1896

colinrotherham · 2026-04-29T07:16:37Z

+  /**
+   * @private
+   * @type {Intl.Segmenter | null}
+   */
+  segmenter = null


Not sure whether this should be @private?

For example this.segmenter can be accessed via the custom count function:

createAll(CharacterCount, { countFunction(text) { // this.segmenter } })

36degrees

I haven't had a chance to do a full review, but my gut feeling is that we should avoid scenarios where the character count can give a different result depending on what browser you're using.

That being the case, in browsers that do not support Intl.Segementer I think we should fall back to the no-JS behaviour, rather than using a regex.

colinrotherham · 2026-05-06T10:36:30Z

Thanks @36degrees

Did you have any thoughts on the new countType Nunjucks option?

I haven't had a chance to do a full review, but my gut feeling is that we should avoid scenarios where the character count can give a different result depending on what browser you're using.

That makes sense, and means we can drop the fallback regexes too

For balance, there are some examples where browser differences are expected:

Internet Explorer 8 counted new lines twice (not supported in v5.x)
Plural rules are not supported in Safari 10.3–12.5 etc

But for the latter issue polyfill weight was involved

That being the case, in browsers that do not support Intl.Segementer I think we should fall back to the no-JS behaviour, rather than using a regex.

We're happy with this though and I can update the PR

colinrotherham · 2026-05-11T14:19:53Z

Pushed an update to do this:

That being the case, in browsers that do not support Intl.Segementer I think we should fall back to the no-JS behaviour, rather than using a regex.

Current options maxlength and maxwords work as usual for backwards compatibility
New options countType: "characters" or countType: "words" use Intl.Segmenter

romaricpascal

Cheers for proposing this @colinrotherham and the care around avoiding a breaking change 🙌🏻

Besides the comments about technical implementation, I'm concerned about a two things:

keeping maxlength as the source for the maximum for both characters may be a bit confusing, both to users used to it being used only for characters and when switching codebases using different versions of GOV.UK Frontend. I'd be keen to use a completely new option (say maximum) that'll be associated to the new countType to avoid confusion
only offering to count words the way Intl.Segmenter does with the countType option that the component is moving towards. I think we should check which way backends count words to make sure our default matches. It might be that we need two ways of counting: one counting like Intl.Segmenter and another only considering the whitespace as we do now, even if it does not match Unicode definition of a word.

Let me know what you think 😊

romaricpascal · 2026-05-12T17:43:09Z

-        this.count = text.match(/\S+/g)?.length ?? 0
-        break
-    }
+    this.count = this.countFunctions[countType].call(this, text)


suggestion Rather than executing the function like if it was a method of the CharacterCount, passing the segmenter as a second argument, inside an options object makes the boundary between the component and the count function clearer, allowing us to control what's exposed to the count function.

Suggested change

this.count = this.countFunctions[countType].call(this, text)

this.count = this.countFunctions[countType](text, {segmenter: this.segmenter})

Maybe we could use a getter to only instantiate the segmenter if the function accesses it, but that's more an optimisation than anything.

Sadly that means the custom countFunction would lose access to:

this.separator to split words yourself

~~this.segmenter to filter the segments yourself~~

this.$textarea to get the value yourself (e.g. trim, normalised line endings, row count etc)

Appreciate that all of these things are accessible anyway as @private isn't really private

Let me know if you'd like me to do anything

I think I'm fine witht the countFunction not having access to much at the start. We can always expand what we provide to the function in minor releases, but we can only restrict what the function receives in breaking releases if we went too far at the start.

Using an object as a second parameter would also clarify what this represents in the component's count functions (where you may think it's the countFunctions object where the functions are defined if you miss the typings).

Overall, if that's OK, I'd prefer we:

pass a second argument to the count function rather than use this (should have flagged that as an 'issue' rather than a 'suggestion')

restrict what the function receives to only the segmenter for now and expand as demand grows, keeping the separator in the function for counting words (thinking that long term, if we want people to manipulate texts before counting 'like the component does', we'd be better off exposing the countFunctions themselves rather than granular details of their implementation).

Hope that makes sense 😊

Instead of a second parameter I've set a custom (restricted) this and updated the types:

- this.count = countFunction.call(this, text) + this.count = countFunction.call({ segmenter: this.segmenter }, text)

Have a look at the diff for my last push to see this.separator has been removed too

To avoid recreating the count function context object every time, it could be persisted?

Either using .call() as this

// Limit access via `this` when calling the count function to prevent // unintended access to internal properties and methods - this.count = countFunction.call( - { - config: this.config, - segmenter: this.segmenter - }, - text - ) + this.count = countFunction.call(this.countFunctionContext, text)

Or as a 2nd param as you prefer:

// Limit access via `this` when calling the count function to prevent // unintended access to internal properties and methods - this.count = countFunction.call( - { - config: this.config, - segmenter: this.segmenter - }, - text - ) + this.count = countFunction(text, this.countFunctionContext)

Would definitely prefer a context as a parameter so that things are explicit rather than accessed through this, thanks. Feels more natural as second parameter. Not against caching it on the instance if you think that's an issue for performance to re-create it at each call.

Another thought that occured to me is that if this.config.maxwords is set, there is no this.segmenter, right? This means that the words function could branch on whether this.segmenter is definer rather than the value in the config. That would allow the public API for the context to be narrower.

Potentially, the characters function could work the same way for consistency. That would also clearly split which part of the component are responsible for what:

constructor decides whether to create a segmenter or not based on the config

count functions decide how to count based on whether they have a segmenter or not

How does that sound?

Sounds good on the 2nd parameter

I'm a little bit lost on the rest 😆

Another thought that occured to me is that if this.config.maxwords is set, there is no this.segmenter, right? This means that the words function could branch on whether this.segmenter is definer rather than the value in the config. That would allow the public API for the context to be narrower.

So this is what I did originally—I think?

Where branching on this.segmenter for countType: "words" gave different results by browser

But it differs to the feelings Ollie set in an earlier comment where he said:

I haven't had a chance to do a full review, but my gut feeling is that we should avoid scenarios where the character count can give a different result depending on what browser you're using.

That being the case, in browsers that do not support Intl.Segementer I think we should fall back to the no-JS behaviour, rather than using a regex.

So from this we've determined:

Users that set maxwords (deprecated) should the existing regex word count

Users that set countType: "words" should get segmenter word counting where supported

Users that set countType: "words" should get the no-JS behaviour where NOT supported

i.e. If you opt-in to use Intl.Segmenter then that's what you get (or the no-JS behaviour)

Hope that's still alright?

Regarding browser support

Knowing that Intl.Segmenter is in Baseline 2024 compare the following queries:

Audience coverage: 94.9 % for supports es6-module, versus

Audience coverage: 9.1 % for supports es6-module and not baseline 2024

Note: There sadly isn't a feature query intl-segmenter like there is for intl-pluralrules

Would definitely prefer a context as a parameter so that things are explicit rather than accessed through this, thanks. Feels more natural as second parameter. Not against caching it on the instance if you think that's an issue for performance to re-create it at each call.

✅ Done (pushed)

So this is what I did originally—I think?

Where branching on this.segmenter for countType: "words" gave different results by browser

But it differs to the feelings Ollie set in an earlier comment where he said:

I think I didn't explain well. Found it easier to attach a comment to the countFunctionContext to explain 😊

colinrotherham · 2026-05-12T19:12:37Z

Thanks @romaricpascal

You might have missed that word counting retains the current approach if maxwords is set 🙌

Regarding maximum versus using maxlength, wouldn't the latter mean zero changes are necessary should segmenter become the default in a future major release?

✅ GOV.UK Frontend v2.2.0+

The maxlength option always works

{{ govukCharacterCount({
  label: {
    text: "Always works"
  },
  name: "example",
  maxlength: 200
}) }}

colinrotherham · 2026-05-12T19:17:52Z

Do think about a future opt-out though, like segmenter: false?

Or having the word separator regex as an option? If set, bypassing the segmenter

Keen to lock in the API so we can release this on NHS.UK frontend

…only

We must only use `characters` or `words` as the translation key prefix, regardless of the `countType` value

…Segmenter

romaricpascal · 2026-05-14T12:07:33Z

Regarding maximum versus using maxlength, wouldn't the latter mean zero changes are necessary should segmenter become the default in a future major release?

That's a great point, hadn't thought of that. 🙌🏻

You might have missed that word counting retains the current approach if maxwords is set 🙌

My worry was for after we remove maxwords in the next major release (as it's being rightly deprecated in this PR).
Both your propositions of a separator and a segmenter: false opt-out would be a way to work around that, so I think that decision can be delayed until v7.0.0. Both may be useful as well:

separator to offer arbitrary splitting
segmenter: false to avoid creating segmenters unnecessarily if your countFunction does not need one

Keen to lock in the API so we can release this on NHS.UK frontend

Appreciate that'd reduce the divergence between both our Design Systems. However, we can't guarantee our responsiveness when looking at a topic we're not currently focusing on (like what happened for this PR), so please don't stay stuck because of us.

Unless the (deprecated) `maxwords` option is used

…nter Unless the (deprecated) `maxwords` option is used

romaricpascal · 2026-05-21T16:44:43Z

+    this.countFunctionContext = {
+      config: this.config,
+      segmenter: this.segmenter
+    }


Having that context in place makes it easier to explain what I was on about with using the segmenter when counting words.

The idea is still to throw on line 148 if the Segmenter API is not available, not to fallback on the other way of counting when the API is not there.

Because we know the component will only keep initialising when a segmenter is needed if the Segmenter API is available, we can reduce the context to the following, keeping the initial public API narrower (less risk of breaking change in the future) and keeping all config related computations internal to the constructor.

Suggested change

this.countFunctionContext = {

config: this.config,

segmenter: this.segmenter

}

this.countFunctionContext = {

segmenter: this.segmenter

}

Then words can check if (this.segmenter) instead of if (this.config.maxwords).

Hope that makes more sense, aim is to control how much of a public API we offer at the start to avoid having to roll back on it down the line 😊

Thanks, glad that made sense

Hmm you might want to hold off hiding the config for a major breaking release though?

Users that provide countFunction will use the config to:

Determine whether the (deprecated) maxwords option is used

Determine whether they're counting "length", "characters" or "words"

Provide their own non-segmenter fallback based on config.countType

Especially when passing JavaScript configuration via initAll() or createAll() because a single application-wide countFunction will at least need to know the config.countType?

Without a config they can't provide their own fallbacks should the non-JS fallback be unsuitable

colinrotherham commented Apr 29, 2026

View reviewed changes

36degrees reviewed May 6, 2026

View reviewed changes

colinrotherham force-pushed the character-count-segmenter branch from 66b3afb to 405410f Compare May 11, 2026 14:11

This was referenced May 11, 2026

[v10] Add character count countFunction option nhsuk/nhsuk-frontend#1897

Merged

[v10] Add character count countType: "words" option using Intl.Segmenter nhsuk/nhsuk-frontend#1899

Open

romaricpascal reviewed May 12, 2026

View reviewed changes

colinrotherham added 9 commits May 13, 2026 10:35

Make sure unsupported config property types are dropped

00e87e6

Add support for functions in component configs

1906816

Update initAll() config types to use ComponentConfig generic

d1ab61a

Update character count config override types to show they’re partial …

6080d7f

…only

Fix missing this.config in config override function

0e28909

Fix typo

75ce0ef

Default to characters translation key prefix

06ce5bd

We must only use `characters` or `words` as the translation key prefix, regardless of the `countType` value

Deprecate character count maxwords and add countType option

7fcbbfb

Update GOV.UK Frontend support error to accept custom message

3f9a2c0

colinrotherham force-pushed the character-count-segmenter branch from 405410f to 579f202 Compare May 13, 2026 10:32

Add character count support for countType: "characters" using Intl.…

d014d6d

…Segmenter

colinrotherham force-pushed the character-count-segmenter branch from 579f202 to 29c44bf Compare May 14, 2026 11:42

colinrotherham added 2 commits May 14, 2026 13:37

Expose segmenter for countType: "words"

4848496

Unless the (deprecated) `maxwords` option is used

Add character count support for countType: "words" using Intl.Segme…

b18ddd1

…nter Unless the (deprecated) `maxwords` option is used

colinrotherham force-pushed the character-count-segmenter branch from 29c44bf to b18ddd1 Compare May 14, 2026 13:31

colinrotherham added 4 commits May 14, 2026 14:32

Expose character count countFunctions

ad6bf20

Add character count support for countFunction option

1692a88

Expose segmenter for countFunction option

20db8e0

Set up 2nd parameter for count function context

647f7a1

colinrotherham added 2 commits May 21, 2026 15:46

Determine the count function to use in constructor

4e3da12

Cache the count function context in constructor

c24ac3d

romaricpascal reviewed May 21, 2026

View reviewed changes

	this.count = this.countFunctions[countType].call(this, text)
	this.count = this.countFunctions[countType](text, {segmenter: this.segmenter})

Conversation

colinrotherham commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test coverage

Uh oh!

Choose a reason for hiding this comment

Uh oh!

36degrees left a comment

Choose a reason for hiding this comment

Uh oh!

colinrotherham commented May 6, 2026

Uh oh!

colinrotherham commented May 11, 2026

Uh oh!

romaricpascal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

colinrotherham May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

colinrotherham May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Regarding browser support

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

colinrotherham commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ GOV.UK Frontend v2.2.0+

Uh oh!

colinrotherham commented May 12, 2026

Uh oh!

romaricpascal commented May 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

colinrotherham commented Apr 28, 2026 •

edited

Loading

colinrotherham May 13, 2026 •

edited

Loading

colinrotherham May 21, 2026 •

edited

Loading

colinrotherham commented May 12, 2026 •

edited

Loading