Skip to content

ExodusOSS/bytes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@exodus/bytes

Uint8Array conversion to and from base64, base32, base58, hex, utf8, utf16, bech32 and wif

And a TextEncoder / TextDecoder polyfill

Strict

Performs proper input validation, ensures no garbage-in-garbage-out

Tested on Node.js, Deno, Bun, browsers (including Servo), Hermes, QuickJS and barebone engines in CI (how?)

Fast

  • 10-20x faster than Buffer polyfill
  • 2-10x faster than iconv-lite

The above was for the js fallback

It's up to 100x when native impl is available
e.g. in utf8fromString on Hermes / React Native or fromHex in Chrome

Also:

  • 3-8x faster than bs58
  • 10-30x faster than @scure/base (or >100x on Node.js <25)
  • Faster in utf8toString / utf8fromString than Buffer or TextDecoder / TextEncoder on Node.js

See Performance for more info

TextEncoder / TextDecoder polyfill

import { TextDecoder, TextEncoder } from '@exodus/bytes/encoding.js'

Less than half the bundle size of text-encoding, whatwg-encoding or iconv-lite (gzipped or not), and is much faster. See also lite version.

Spec compliant, passing WPT and covered with extra tests.

Moreover, tests for this library uncovered bugs in all major implementations.

Faster than Node.js native implementation on Node.js.

Caveat: TextDecoder / TextEncoder APIs are lossy by default per spec

These are only provided as a compatibility layer, prefer hardened APIs instead in new code.

  • TextDecoder can (and should) be used with { fatal: true } option for all purposes demanding correctness / lossless transforms

  • TextEncoder does not support a fatal mode per spec, it always performs replacement.

    That is not suitable for hashing, cryptography or consensus applications.
    Otherwise there would be non-equal strings with equal signatures and hashes — the collision is caused by the lossy transform of a JS string to bytes. Those also survive e.g. JSON.stringify/JSON.parse or being sent over network.

    Use strict APIs in new applications, see utf8fromString / utf16fromString below.
    Those throw on non-well-formed strings by default.

Lite version

If you don't need support for legacy multi-byte encodings, you can use the lite import:

import { TextDecoder, TextEncoder } from '@exodus/bytes/encoding-lite.js'

This reduces the bundle size 10x:
from 90 KiB gzipped for @exodus/bytes/encoding.js to 9 KiB gzipped for @exodus/bytes/encoding-lite.js.
(For comparison, text-encoding module is 190 KiB gzipped, and iconv-lite is 194 KiB gzipped).

It still supports utf-8, utf-16le, utf-16be and all single-byte encodings specified by the spec, the only difference is support for legacy multi-byte encodings.

See the list of encodings.

API

@exodus/bytes/utf8.js

utf8fromString(str, format = 'uint8')
utf8fromStringLoose(str, format = 'uint8')
utf8toString(arr)
utf8toStringLoose(arr)

@exodus/bytes/utf16.js

utf16fromString(str, format = 'uint16')
utf16fromStringLoose(str, format = 'uint16')
utf16toString(arr, 'uint16')
utf16toStringLoose(arr, 'uint16')

@exodus/bytes/single-byte.js

createSinglebyteDecoder(encoding, loose = false)

Create a decoder for a supported one-byte encoding.

Returns a function decode(arr) that decodes bytes to a string.

@exodus/bytes/multi-byte.js

createMultibyteDecoder(encoding, loose = false)

Create a decoder for a supported legacy multi-byte encoding.

Returns a function decode(arr, stream = false) that decodes bytes to a string.

That function will have state while stream = true is used.

windows1252toString(arr)

Decode windows-1252 bytes to a string.

Also supports ascii and latin-1 as those are strict subsets of windows-1252.

There is no loose variant for this encoding, all bytes can be decoded.

Same as windows1252toString = createSinglebyteDecoder('windows-1252').

@exodus/bytes/bigint.js

fromBigInt(bigint, { length, format = 'uint8' })
toBigInt(arr)

@exodus/bytes/hex.js

toHex(arr)
fromHex(string)

@exodus/bytes/base64.js

toBase64(arr, { padding = true })
toBase64url(arr, { padding = false })
fromBase64(str, { format = 'uint8', padding = 'both' })
fromBase64url(str, { format = 'uint8', padding = false })
fromBase64any(str, { format = 'uint8', padding = 'both' })

@exodus/bytes/base32.js

toBase32(arr, { padding = false })
toBase32hex(arr, { padding = false })
fromBase32(str, { format = 'uint8', padding = 'both' })
fromBase32hex(str, { format = 'uint8', padding = 'both' })

@exodus/bytes/bech32.js

getPrefix(str, limit = 90)
toBech32(prefix, bytes, limit = 90)
fromBech32(str, limit = 90)
toBech32m(prefix, bytes, limit = 90)
fromBech32m(str, limit = 90)

@exodus/bytes/base58.js

toBase58(arr)
fromBase58(str, format = 'uint8')
toBase58xrp(arr)
fromBase58xrp(str, format = 'uint8')

@exodus/bytes/base58check.js

async toBase58check(arr)
toBase58checkSync(arr)
async fromBase58check(str, format = 'uint8')
fromBase58checkSync(str, format = 'uint8')
makeBase58check(hashAlgo, hashAlgoSync)

@exodus/bytes/wif.js

async fromWifString(string, version)
fromWifStringSync(string, version)
async toWifString({ version, privateKey, compressed })
toWifStringSync({ version, privateKey, compressed })

@exodus/bytes/encoding.js

Implements the Encoding standard: TextDecoder, TextEncoder, some hooks (see below).

import { TextDecoder, TextDecoder } from '@exodus/bytes/encoding.js'

// Hooks for standards
import { getBOMEncoding, legacyHookDecode, normalizeEncoding } from '@exodus/bytes/encoding.js'

new TextDecoder(label = 'utf-8', { fatal = false, ignoreBOM = false })

TextDecoder implementation/polyfill.

new TextEncoder()

TextEncoder implementation/polyfill.

normalizeEncoding(label)

Implements get an encoding from a string label.

Converts an encoding label to its name, as an ASCII-lowercased string.

If an encoding with that label does not exist, returns null.

This is the same as decoder.encoding getter, except that it:

  1. Supports replacement encoding and its labels
  2. Does not throw for invalid labels and instead returns null

All encoding names are also valid labels for corresponding encodings.

getBOMEncoding(input)

Implements BOM sniff legacy hook.

Given a TypedArray or an ArrayBuffer instance input, returns either of:

  • 'utf-8', if input starts with UTF-8 byte order mark.
  • 'utf-16le', if input starts with UTF-16LE byte order mark.
  • 'utf-16be', if input starts with UTF-16BE byte order mark.
  • null otherwise.

legacyHookDecode(input, fallbackEncoding = 'utf-8')

Implements decode legacy hook.

Given a TypedArray or an ArrayBuffer instance input and an optional fallbackEncoding normalized encoding name, sniffs encoding from BOM with fallbackEncoding fallback and then decodes the input using that encoding, skipping BOM if it was present.

Notes:

  • BOM-sniffed encoding takes precedence over fallbackEncoding option per spec. Use with care.
  • fallbackEncoding must be ASCII-lowercased encoding name, e.g. a result of normalizeEncoding(label) call.
  • Always operates in non-fatal mode, aka replacement. It can convert different byte sequences to equal strings.

This method is similar to the following code, except that it doesn't support encoding labels and only expects lowercased encoding name:

new TextDecoder(getBOMEncoding(input) ?? fallbackEncoding ?? 'utf-8').decode(input)

@exodus/bytes/encoding-lite.js

import { TextDecoder, TextDecoder } from '@exodus/bytes/encoding-lite.js'

// Hooks for standards
import { getBOMEncoding, legacyHookDecode, normalizeEncoding } from '@exodus/bytes/encoding-lite.js'

The exact same exports as @exodus/bytes/encoding.js are also exported as @exodus/bytes/encoding-lite.js, with the difference that the lite version does not load multi-byte TextDecoder encodings by default to reduce bundle size 10x.

The only affected encodings are: gbk, gb18030, big5, euc-jp, iso-2022-jp, shift_jis and their labels when used with TextDecoder.

Legacy single-byte encodingds are loaded by default in both cases.

TextEncoder and hooks for standards (including normalizeEncoding) do not have any behavior differences in the lite version and support full range if inputs.

To avoid inconsistencies, the exported classes and methods are exactly the same objects.

> lite = require('@exodus/bytes/encoding-lite.js')
[Module: null prototype] {
  TextDecoder: [class TextDecoder],
  TextEncoder: [class TextEncoder],
  getBOMEncoding: [Function: getBOMEncoding],
  legacyHookDecode: [Function: legacyHookDecode],
  normalizeEncoding: [Function: normalizeEncoding]
}
> new lite.TextDecoder('big5').decode(Uint8Array.of(0x25))
Uncaught:
Error: Legacy multi-byte encodings are disabled in /encoding-lite.js, use /encoding.js for full encodings range support

> full = require('@exodus/bytes/encoding.js')
[Module: null prototype] {
  TextDecoder: [class TextDecoder],
  TextEncoder: [class TextEncoder],
  getBOMEncoding: [Function: getBOMEncoding],
  legacyHookDecode: [Function: legacyHookDecode],
  normalizeEncoding: [Function: normalizeEncoding]
}
> full.TextDecoder === lite.TextDecoder
true
> new full.TextDecoder('big5').decode(Uint8Array.of(0x25))
'%'
> new lite.TextDecoder('big5').decode(Uint8Array.of(0x25))
'%'

License

MIT