Skip to content

perf(encoding): improve base32 encode/decode performance #6479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
222 changes: 0 additions & 222 deletions encoding/_base32_common.ts

This file was deleted.

68 changes: 44 additions & 24 deletions encoding/base32.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@
// Copyright (c) 2014 Jameson Little. MIT License.
// This module is browser compatible.

import type { Uint8Array_ } from "./_types.ts";
export type { Uint8Array_ };

/**
* Utilities for
* {@link https://www.rfc-editor.org/rfc/rfc4648.html#section-6 | base32}
Expand All @@ -26,51 +23,74 @@
*
* @module
*/
import { decode, encode } from "./_base32_common.ts";

const lookup: string[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567".split("");
const revLookup: number[] = [];
lookup.forEach((c, i) => (revLookup[c.charCodeAt(0)] = i));
import { calcMax, decode, encode } from "./_common32.ts";
import { detach } from "./_common_detach.ts";
import type { Uint8Array_ } from "./_types.ts";
export type { Uint8Array_ };

const padding = "=".charCodeAt(0);
const alphabet = new TextEncoder()
.encode("ABCDEFGHIJKLMNOPQRSTUVWXYZ234567");
const rAlphabet = new Uint8Array(128).fill(32); //alphabet.length
alphabet.forEach((byte, i) => rAlphabet[byte] = i);

/**
* Decodes a base32-encoded string.
* Converts data into a base32-encoded string.
*
* @see {@link https://www.rfc-editor.org/rfc/rfc4648.html#section-6}
*
* @param b32 The base32-encoded string to decode.
* @returns The decoded data.
* @param data The data to encode.
* @returns The base32-encoded string.
*
* @example Usage
* ```ts
* import { decodeBase32 } from "@std/encoding/base32";
* import { encodeBase32 } from "@std/encoding/base32";
* import { assertEquals } from "@std/assert";
*
* assertEquals(
* decodeBase32("GZRTMMDDGA======"),
* new TextEncoder().encode("6c60c0"),
* );
* assertEquals(encodeBase32("6c60c0"), "GZRTMMDDGA======");
* ```
*/
export function decodeBase32(b32: string): Uint8Array_ {
return decode(b32, lookup);
export function encodeBase32(data: ArrayBuffer | Uint8Array | string): string {
if (typeof data === "string") {
data = new TextEncoder().encode(data) as Uint8Array_;
} else if (data instanceof ArrayBuffer) data = new Uint8Array(data).slice();
else data = data.slice();

Check warning on line 58 in encoding/base32.ts

View check run for this annotation

Codecov / codecov/patch

encoding/base32.ts#L58

Added line #L58 was not covered by tests
const [output, i] = detach(
data as Uint8Array_,
calcMax((data as Uint8Array_).length),
);
encode(output, i, 0, alphabet, padding);
return new TextDecoder().decode(output);
}

/**
* Converts data into a base32-encoded string.
* Decodes a base32-encoded string.
*
* @see {@link https://www.rfc-editor.org/rfc/rfc4648.html#section-6}
*
* @param data The data to encode.
* @returns The base32-encoded string.
* @param b32 The base32-encoded string to decode.
* @returns The decoded data.
*
* @example Usage
* ```ts
* import { encodeBase32 } from "@std/encoding/base32";
* import { decodeBase32 } from "@std/encoding/base32";
* import { assertEquals } from "@std/assert";
*
* assertEquals(encodeBase32("6c60c0"), "GZRTMMDDGA======");
* assertEquals(
* decodeBase32("GZRTMMDDGA======"),
* new TextEncoder().encode("6c60c0"),
* );
* ```
*/
export function encodeBase32(data: ArrayBuffer | Uint8Array | string): string {
return encode(data, lookup);
export function decodeBase32(b32: string): Uint8Array_ {
const output = new TextEncoder().encode(b32) as Uint8Array_;
if (output.length % 8) {
throw new TypeError(
`Invalid base32 string: length (${output.length}) must be a multiple of 8`,
);
}
// deno-lint-ignore no-explicit-any
return new Uint8Array((output.buffer as any)
.transfer(decode(output, 0, 0, rAlphabet, padding)));
}
6 changes: 3 additions & 3 deletions encoding/base32_test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Deno.test({
assertThrows(
() => decodeBase32("OOOO=="),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of this input seems changed by this PR. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The way the new code detects errors is different where the old input would be valid. The old input had the wrong length based off the encoding + padding but the new code doesn't care about the padding as long as it isn't too much padding. Four encoded characters is valid here while three is invalid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new code doesn't care about the padding as long as it isn't too much padding.

I'm not sure if we should do that change. I checked the standard libraries of some other languages, but they seem checking padding length. Probably we should keep that check?

$ cat enc.py 
import base64
base64.b32decode("OOOO==")
$ python3 enc.py
Traceback (most recent call last):
  File "/Users/kt3k/denoland/std/enc.py", line 2, in <module>
    base64.b32decode("OOOO==")
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/base64.py", line 254, in b32decode
    return _b32decode(_b32alphabet, s, casefold, map01)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/base64.py", line 210, in _b32decode
    raise binascii.Error('Incorrect padding')
binascii.Error: Incorrect padding
$ cat main.go 
package main

import (
  "encoding/base32"
  "fmt"
)

func main() {
  enc := base32.NewEncoding("ABCDEFGHIJKLMNOPQRSTUVWXYZ234567")
  res, e := enc.DecodeString("OOOO==")
  if e != nil {
    fmt.Printf("%s", e)
  } else {
    fmt.Printf("%s", res)
  }
}
$ go run main.go 
illegal base32 data at input byte 6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more written this way for convince. The code for it is here: https://github.com/denoland/std/blob/main/encoding/_common32.ts#L107-L119 where it really is just seeing how much padding it needs to slice off before doing the actual decoding.

I can understand wanting to be in align with other std's but I also don't see a benefit in being strict here with the padding when decoding when you know what the missing bytes are.

The change to base64 operates in the same manner here. Where base64url may allow omitting padding, but base64 requires it, the decoder doesn't care if padding is short

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this kind of change needs more broader consensus from the community and core team, and I'm not in favor of landing this as part of perf change. Can you add back the simple check of length like the below and create an issue for discussing that?:

  if (len % 8 > 0) {
    throw new Error(
      `Cannot decode base32 string as the length must be a multiple of 8: received length ${len}`,
    );
  }

The change to base64 operates in the same manner here. Where base64url may allow omitting padding, but base64 requires it, the decoder doesn't care if padding is short

Base64 decoder seems ignoring missing padding before the recent rewrite (#6461). That might be a good reason to remove padding length check from decodeBase32. Anyway I think we need to discuss this topic independently somewhere else.

Error,
"Cannot decode base32 string as the length must be a multiple of 8: received length 6",
"Invalid base32 string: length (6) must be a multiple of 8",
);
},
});
Expand All @@ -48,9 +48,9 @@ Deno.test({
name: "decodeBase32() throws on bad padding",
fn() {
assertThrows(
() => decodeBase32("5HXR334AQYAAAA=="),
() => decodeBase32("5HXR334AQYAAAA=========="),
Error,
"Invalid pad length",
"Invalid Character (=)",
);
},
});
Expand Down
Loading