-
-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
and might not be suitable for e.g. hashing
i.e.:
- there exist non-equal JS strings being encoded into equal Uint8Array buffers
- there exist non-equal Uint8Array buffers being decoded into equal JS strings
Demo:
import { utf8, hex } from './index.ts'
import { sha256 } from '@noble/hashes/sha2.js'
const h = hex.encode
// Unpaired surrogates
{
const s0 = 'what\ud800ever'
const s1 = 'what\ud820ever'
const u0 = utf8.decode(s0)
const u1 = utf8.decode(s1)
console.log(`1. Strings equal: ${s0 === s1}, u8 equal: ${h(u0) === h(u1)}`) // expect false or throw
console.log(` Bonus: hashes equal: ${h(sha256(u0)) === h(sha256(u1))}`)
}
// Invalid utf-8
{
const u0 = Uint8Array.of(0x80)
const u1 = Uint8Array.of(0x81)
const s0 = utf8.encode(u0)
const s1 = utf8.encode(u1)
console.log(`2. u8 equal: ${h(u0) === h(u1)}, strings equal: ${s0 === s1}`) // expect false or throw
}
// BOM
{
const s0 = '\uFEFFHello, world!'
const u0 = utf8.decode(s0)
const s1 = utf8.encode(u0)
const u1 = utf8.decode(s1)
console.log(`3. Strings equal: ${s0 === s1}, u8 equal: ${h(u0) === h(u1)}`) // expect true
}To build a strict impl:
- Use
new TextDecoder('utf8', { ignoreBOM: true, fatal: true })to preserve BOM and throw on errors - For
TextEncoder, there is nofataloption. Usestring.isWellFormed()when available. When not - check for presence ofEFBFBDin the output, and if it's present recheck that the output decodes back to the same string.
Replacing the current impt with a strict one will be a breaking change
An alternative is to export it under a different name
Metadata
Metadata
Assignees
Labels
No labels