Skip to content

Incorrect UTF-8 decoding #92

Open
Open
@swansontec

Description

@swansontec

The utf8.fromBytes routine does not handle 4-byte character sequences.

Demo

$ echo -n 𠜎 | hexdump
0000000 f0 a0 9c 8e

The character 𠜎 has a 4-byte encoding, so let's try putting that into fromBytes:

const aesjs = require('aes-js')

const bytes = [0xf0, 0xa0, 0x9c, 0x8e]
const string = aesjs.utils.utf8.fromBytes(bytes)
console.log(string)

Nothing prints. Doing it with buffer works as expected:

console.log(Buffer.from(bytes).toString()) // Prints 𠜎

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions