Skip to content

EntityParser can't handle encoded emoji #67

Open
@aduth

Description

@aduth

While the tokenizer will gracefully decode most encoded characters:

⇒ node
> var Tokenizer = require( 'simple-html-tokenizer' );
undefined
> Tokenizer.tokenize( '&' )[ 0 ].chars === '&'
true

It doesn't handle characters whose encodings exceed 16 bits (e.g. emoji):

⇒ node
> var Tokenizer = require( 'simple-html-tokenizer' );
undefined
> Tokenizer.tokenize( '😅' )[ 0 ].chars === '😅'
false

It may be that EntityParser should use String.fromCodePoint in place of String.fromCharCode instead, or an equivalent polyfill?

Related:

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions