Hi!
Friendly ping from @exodus/bytes which also provides an impl (which is fast, consistent, spec-compliant, and tested to match on all platforms).
When cross-testing, I found a number of issues. I'm not filing them individually to avoid spam, but instead as a single list (which can be later split to subtasks).
-
UTF-8:
-
BOM handling is inconsistent.
On some platforms, textDecode(Uint8Array.of(0xEF, 0xBB, 0xBF, 0x42)) is 'B', on others, it's '\uFEFFB'.
-
textDecode fallback fails to polyfill replacement (or fatal mode) and produces garbage output:
import { textDecode } from '@borewit/text-codec'
console.log(escape(textDecode(Uint8Array.of(0, 254, 255))))
console.log(escape(textDecode(Uint8Array.of(0x80))))
console.log(escape(textDecode(Uint8Array.of(0xf0, 0x90, 0x80))))
console.log(escape(textDecode(Uint8Array.of(0xf0, 0x80, 0x80))))
Should be:
%00%uFFFD%uFFFD
%uFFFD
%uFFFD
%uFFFD%uFFFD%uFFFD
But results in this in polyfill:
%00%uDABC%uDC00
%00
%uD800%uDC00
%uDBC0%uDC00
This behavior is platform-dependent and results are not consistent across platforms.
-
Same for textEncode.
textEncode('\ud800') is Uint8Array(3) [ 239, 191, 189 ] in native but Uint8Array(3) [ 237, 160, 128 ] in fallback.
-
Same for UTF-16: wrong output on non-well-formed input
Moreover, utf-16le decoder can return non-well-formed strings, which can have security impact in some setups due to how hashing and signatures behave on those.
-
The fallback is slow overall.
- On Node.js, this is ~20x slower on utf-16le, ~50x slower on iso-8859-1, and 15x-190x slower on windows-1252.
- This lib is documented to be a polyfill for React Native, but a performant polyfill for RN is ~5x-10x faster.
-
Documentation is off. windows-1252 is mentioned as implemented in Node.js, as well as UTF-16 in Node.js small-icu. In fact, in most Node.js versions released in 2025 windows-1252 was broken and returned wrong output. As was UTF-16 in without-intl Node.js. This also mentions as if it adds something on top of those, but in fact, this doesn't use native impls for anything except utf-8.
-
token-types internal documentation is off.
https://github.com/Borewit/token-types/blob/1692c1cca988da14e28fc19fde24653e36f3db90/lib/index.ts#L434 says:
Supports all encodings supported by TextDecoder, plus 'windows-1252'.
But it doesn't, only a few encodings are supported, and only utf-8 is used with TextDecoder, there is no transparent fallback.
Alternatively, just reuse @exodus/bytes or copy impl from it 🙃 (I can help to point to the imports that you'll likely want)
Hi!
Friendly ping from @exodus/bytes which also provides an impl (which is fast, consistent, spec-compliant, and tested to match on all platforms).
When cross-testing, I found a number of issues. I'm not filing them individually to avoid spam, but instead as a single list (which can be later split to subtasks).
UTF-8:
BOM handling is inconsistent.
On some platforms,
textDecode(Uint8Array.of(0xEF, 0xBB, 0xBF, 0x42))is'B', on others, it's'\uFEFFB'.textDecodefallback fails to polyfill replacement (or fatal mode) and produces garbage output:Should be:
But results in this in polyfill:
This behavior is platform-dependent and results are not consistent across platforms.
Same for
textEncode.textEncode('\ud800')isUint8Array(3) [ 239, 191, 189 ]in native butUint8Array(3) [ 237, 160, 128 ]in fallback.Same for UTF-16: wrong output on non-well-formed input
Moreover, utf-16le decoder can return non-well-formed strings, which can have security impact in some setups due to how hashing and signatures behave on those.
The fallback is slow overall.
Documentation is off.
windows-1252is mentioned as implemented in Node.js, as well as UTF-16 in Node.js small-icu. In fact, in most Node.js versions released in 2025windows-1252was broken and returned wrong output. As was UTF-16 in without-intl Node.js. This also mentions as if it adds something on top of those, but in fact, this doesn't use native impls for anything except utf-8.token-typesinternal documentation is off.https://github.com/Borewit/token-types/blob/1692c1cca988da14e28fc19fde24653e36f3db90/lib/index.ts#L434 says:
But it doesn't, only a few encodings are supported, and only utf-8 is used with TextDecoder, there is no transparent fallback.
Alternatively, just reuse
@exodus/bytesor copy impl from it 🙃 (I can help to point to the imports that you'll likely want)