-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
if target text has a half of surrogate pair (like 0xFE0F), parser regards its type as "emoji"
twemoji-parser v12.1.3
Actual behavior
import { parse } from 'twemoji-parser';
const str = 'Twemoji✨️Parser';
// this text has invisible character(0xFE0F) after sparkles emoji
escape(str); // 'Twemoji%u2728%uFE0FParser'
const entities = parse(str);
/*
entities = [
{
url: 'https://twemoji.maxcdn.com/v/latest/svg/2728.svg',
indices: [ 7, 8 ],
text: '✨',
type: 'emoji'
},
// url is empty!!!! but type: emoji
{ url: '', indices: [ 8, 9 ], text: '️', type: 'emoji' }
]
*/
Expected behavior
I'd be grateful if the result is like this!
/*
entities = [
{
url: 'https://twemoji.maxcdn.com/v/latest/svg/2728.svg',
indices: [ 7, 8 ],
text: '✨',
type: 'emoji'
}
]
*/
The cause(I think)
when the text which has an invisible character is checked by emojiRegex, the result includes an invisible character.
https://github.com/twitter/twemoji-parser/blob/master/src/index.js#L34
const str = 'Twemoji✨️Parser';
const res1 = emojiRegex.exec(str);
res1[0] // "✨"
const res2 = emojiRegex.exec(str);
res2[0] // "️"
escape(res2[0]) // "%uFE0F"
Metadata
Metadata
Assignees
Labels
No labels