Skip to content

[v10] Add character count countFunction option#1897

Merged
colinrotherham merged 3 commits into
support/10.xfrom
character-count-custom-function
May 12, 2026
Merged

[v10] Add character count countFunction option#1897
colinrotherham merged 3 commits into
support/10.xfrom
character-count-custom-function

Conversation

@colinrotherham
Copy link
Copy Markdown
Contributor

@colinrotherham colinrotherham commented Apr 20, 2026

Description

This PR adds a new countFunction option to character counts to cater for server-side differences in:

  • Line length differences due to \n versus \r\n
  • Word counts that vary based on empty spaces and punctuations
  • Trimming empty space before counting
  • Native multi-byte string support

For example, services might already count multi-byte strings server-side (e.g. len() in Python) resulting in client-side count mismatches, yet upcoming support for grapheme countType is blocked by a 3rd party library integration

Support for a custom countFunction means teams can close this support gap

Checklist

@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 20, 2026 11:42 Inactive
@colinrotherham colinrotherham changed the title [v10] Add character count support for countFunction option [v10] Add character count countFunction option Apr 20, 2026
@colinrotherham colinrotherham force-pushed the component-config-functions branch from 5562f44 to c3c6fc7 Compare April 20, 2026 12:42
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 272d552 to 551bf35 Compare April 20, 2026 12:42
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 20, 2026 12:43 Inactive
* @satisfies {Record<string, (text: string, segmenter?: Intl.Segmenter | null) => number>}
*/
static countFunctions = Object.freeze({
characters(text, segmenter) {
Copy link
Copy Markdown
Contributor

@MatMoore MatMoore Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on the interface here...

Currently there are two things that influence the choice of algorithm: the selection of countFunction, and whether segmenter is set or not. Wondering if it would be simpler to have explicit functions for graphemes as well as characters and words? In that case passing countType: "graphemes" would be syntactic sugar for passing countFunction: countFunctions.graphemes and the same goes for "characters" and "words".

It also seems slightly inconsistent that if you pass countType: "words" with a custom function, we pass through a segmenter that can do word segmentation, but if you don't pass the custom function, it doesn't use the segmenter.

Maybe if the interface is simplified to fn(text) the segmenter could be an internal implementation detail?

Copy link
Copy Markdown
Contributor Author

@colinrotherham colinrotherham Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I partly did this because Intl.Segmenter also supports granularity: "word" 😮

So instead of countType: "words" we could keep maxwords but set granularity: "word"

I'll see what feedback comes from the GOV.UK Design System team

Maybe if the interface is simplified to fn(text) the segmenter could be an internal implementation detail?

Yeah, I'm glad you said. I reckon for now let's ditch the static property and keep it all internal, especially with a potential performance cost to creating a new segmenter on every keystroke (hence injecting it in)

Copy link
Copy Markdown
Contributor Author

@colinrotherham colinrotherham Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback applied, thanks @MatMoore

Regarding word counts via the segmenter, I've messaged @romaricpascal and added a comment as the Unicode Default Word Boundary Specification varies quite significantly from /\S+/g

For example, consider this phrase:

My mother-in-law—Wait, what?

It matches only 3 words currently:

["My", "mother-in-law—Wait,", "what?"]

Yet using the segmenter with granularity: "word" it matches 6 words:

["My", "mother", "in", "law", "Wait", "what"]

Copy link
Copy Markdown
Contributor

@MatMoore MatMoore Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm glad you said. I reckon for now let's ditch the static property and keep it all internal, especially with a potential performance cost to creating a new segmenter on every keystroke (hence injecting it in)

Yeah I figured we wouldn't want to instantiate one each call. The way you've done it is what I was imagining.

That unicode word boundary spec seems like a cleaner way to do word counting if we were implementing from scratch, but I guess it's a potentially breaking change so maybe it's not worth it if we're keeping maxwords. Are we going to wait for the govuk implementation before we merge this?

Copy link
Copy Markdown
Contributor Author

@colinrotherham colinrotherham Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yeah it'll be discussed at their next dev catch up

What we know is that they've proposed for maxwords to be deprecated

We could take the countType option as an opt-in to use segmenter. But as this comment mentions that might mean upgrading /\S+/g to a better word boundary regex for older browsers that lack support

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They've not had chance to discuss anything character count related other than this comment:

But that applies to #1899 primarily

For this PR all count functions now use the same interface so it's ready for another review

@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 551bf35 to 921cd37 Compare April 21, 2026 07:36
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 21, 2026 07:37 Inactive
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 921cd37 to 6b80700 Compare April 21, 2026 07:40
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 21, 2026 07:40 Inactive
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 6b80700 to 07676d2 Compare April 22, 2026 13:49
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 22, 2026 13:49 Inactive
@colinrotherham colinrotherham force-pushed the component-config-functions branch from c3c6fc7 to e1a7744 Compare April 22, 2026 13:53
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 07676d2 to 649b198 Compare April 22, 2026 13:54
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 22, 2026 13:54 Inactive
@colinrotherham colinrotherham force-pushed the component-config-functions branch from e1a7744 to 9ac9e9b Compare April 27, 2026 08:37
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 649b198 to e76c2e7 Compare April 27, 2026 08:37
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 27, 2026 08:38 Inactive
@colinrotherham colinrotherham force-pushed the component-config-functions branch from 9ac9e9b to 93515cf Compare April 28, 2026 11:30
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from e76c2e7 to 6c80311 Compare April 28, 2026 11:32
@colinrotherham colinrotherham had a problem deploying to nhsuk-frontend-pr-1897 April 28, 2026 11:32 Failure
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 6c80311 to 3290d27 Compare April 28, 2026 11:47
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 28, 2026 11:47 Inactive
@colinrotherham colinrotherham force-pushed the component-config-functions branch from 93515cf to 906201e Compare April 28, 2026 12:01
@colinrotherham colinrotherham changed the base branch from component-config-functions to character-count-grapheme April 28, 2026 12:05
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 3290d27 to 9492e12 Compare April 28, 2026 12:06
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 28, 2026 12:06 Inactive
@colinrotherham colinrotherham force-pushed the character-count-grapheme branch 2 times, most recently from eb6e5ba to 4365f61 Compare April 28, 2026 14:13
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 9492e12 to e93b9da Compare April 28, 2026 14:15
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 April 28, 2026 14:15 Inactive
@colinrotherham colinrotherham linked an issue Apr 28, 2026 that may be closed by this pull request
@colinrotherham colinrotherham force-pushed the character-count-grapheme branch 3 times, most recently from c7b7bbe to 1da2682 Compare May 8, 2026 16:18
Base automatically changed from character-count-grapheme to support/10.x May 8, 2026 16:25
@colinrotherham colinrotherham added this to the v10.5.0 milestone May 8, 2026
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from e93b9da to 078f07a Compare May 11, 2026 09:47
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 May 11, 2026 09:48 Inactive
@colinrotherham colinrotherham force-pushed the character-count-custom-function branch from 078f07a to 7b1a233 Compare May 11, 2026 13:49
@colinrotherham colinrotherham temporarily deployed to nhsuk-frontend-pr-1897 May 11, 2026 13:49 Inactive
@sonarqubecloud
Copy link
Copy Markdown

@colinrotherham
Copy link
Copy Markdown
Contributor Author

@colinrotherham colinrotherham merged commit 0533db3 into support/10.x May 12, 2026
24 checks passed
@colinrotherham colinrotherham deleted the character-count-custom-function branch May 12, 2026 10:15
@github-project-automation github-project-automation Bot moved this from Needs review to Ready to release in Service Manual Sprint Board May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Ready to release

Development

Successfully merging this pull request may close these issues.

Character count component counts code points, not characters

3 participants