Skip to content

@counter-style pad descriptor incorrectly counts code points instead of grapheme clusters #1624

@u1f992

Description

@u1f992

Describe the bug
The pad descriptor in @counter-style counts Unicode code points instead of grapheme clusters when calculating padding length. This causes incorrect padding when using composite emoji characters (such as family emoji, flag emoji, or emoji with skin tone modifiers) as the pad symbol.

According to the CSS Counter Styles Level 3 specification, character counting should be based on grapheme clusters:
https://drafts.csswg.org/css-counter-styles/#counter-style-pad

The issue is in packages/core/src/vivliostyle/counter-style.ts, specifically in the #applyPadding method. The code currently uses spread operator to count characters:

const negativeLength = usesNegative
  ? [...negPrefix].length + (negSuffix ? [...negSuffix].length : 0)
  : 0;

const diff = minLength - [...initialRep].length - negativeLength;
// ...
const padLength = [...padSymbol].length;

To Reproduce

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Grapheme Cluster Issue</title>
  <style>
    @counter-style family-pad {
      system: numeric;
      symbols: "0" "1" "2" "3" "4" "5" "6" "7" "8" "9";
      pad: 5 "👨‍👩‍👧‍👦";
    }

    ol {
      padding-left: 6em;
      list-style: family-pad;
    }
  </style>
</head>
<body>
  <ol>
    <li>First</li>
    <li>Second</li>
    <li>Third</li>
  </ol>
</body>
</html>

Expected behavior
4 emojis + 1 digit = 5 grapheme clusters (Tested on Chrome 143)
Image

Actual behavior
Image

The family emoji 👨‍👩‍👧‍👦 is 1 grapheme cluster but 7 code points (4 person emojis + 3 ZWJ characters). The current implementation uses [...string].length which counts code points, causing padLength = 7 instead of the correct padLength = 1.

Additional context
The fix would require using Intl.Segmenter with granularity: 'grapheme' to correctly count grapheme clusters. However, Intl.Segmenter was added in ES2022, and the current tsconfig for packages/core specifies:

"target": "ES2018",
"lib": ["es2018", "dom", "dom.iterable"]

To use Intl.Segmenter, one of the following changes would be required:

  • Add "ES2022.Intl" to the lib array, or
  • Update target to "ES2022" or higher

A runtime fallback to the current implementation (code point counting) could also be considered for environments where Intl.Segmenter is not available.

Note: counter-style.ts contains a comment suggesting that CounterStyle.format may require lang as an argument to pass to Intl.Segmenter. However, Extended Grapheme Cluster segmentation (UAX#29) is locale-independent—both at the specification level (GB rules have no locale branching) and in ICU4C implementation (which major browsers use). Adding a lang argument is not required.

I would like to ask for the opinion on whether updating the target/lib is acceptable for this fix. > @MurakamiShinyu

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions