Skip to content

Parser regards a half of surrogate pair as “emoji” #8

@yk1979

Description

@yk1979

if target text has a half of surrogate pair (like 0xFE0F), parser regards its type as "emoji"
twemoji-parser v12.1.3

Actual behavior

import { parse } from 'twemoji-parser';

const str = 'Twemoji✨️Parser';
// this text has invisible character(0xFE0F) after sparkles emoji
escape(str); // 'Twemoji%u2728%uFE0FParser'

const entities = parse(str);

/*
entities = [
  {
    url: 'https://twemoji.maxcdn.com/v/latest/svg/2728.svg',
    indices: [ 7, 8 ],
    text: '✨',
    type: 'emoji'
  },
  // url is empty!!!! but type: emoji
  { url: '', indices: [ 8, 9 ], text: '️', type: 'emoji' }
]
*/

Expected behavior

I'd be grateful if the result is like this!

/*
entities = [
  {
    url: 'https://twemoji.maxcdn.com/v/latest/svg/2728.svg',
    indices: [ 7, 8 ],
    text: '✨',
    type: 'emoji'
  }
]
*/

The cause(I think)

when the text which has an invisible character is checked by emojiRegex, the result includes an invisible character.
https://github.com/twitter/twemoji-parser/blob/master/src/index.js#L34

const str = 'Twemoji✨️Parser';
const res1 = emojiRegex.exec(str);
res1[0] // "✨"

const res2 = emojiRegex.exec(str);
res2[0] // "️"
escape(res2[0]) // "%uFE0F"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions