Skip to content

feat: Implement proposal-arraybuffer-base64#4630

Open
magic-akari wants to merge 1 commit intoboa-dev:mainfrom
magic-akari:feat/arraybuffer-base64-data-encoding
Open

feat: Implement proposal-arraybuffer-base64#4630
magic-akari wants to merge 1 commit intoboa-dev:mainfrom
magic-akari:feat/arraybuffer-base64-data-encoding

Conversation

@magic-akari
Copy link
Copy Markdown
Contributor

@magic-akari magic-akari commented Feb 16, 2026

@github-actions
Copy link
Copy Markdown

Test262 conformance changes

Test result main count PR count difference
Total 52,862 52,862 0
Passed 49,471 49,534 +63
Ignored 2,249 2,187 -62
Failed 1,142 1,141 -1
Panics 0 0 0
Conformance 93.59% 93.70% +0.12%
Fixed tests (63):
test/built-ins/Uint8Array/fromHex/length.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/odd-length-input.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/name.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/results.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/illegal-characters.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/string-coercion.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/ignores-receiver.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/receiver-not-uint8array.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/length.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/omit-padding.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/option-coercion.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/alphabet.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/detached-buffer.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/name.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/results.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/length.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/writes-up-to-error.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/subarray.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/detached-buffer.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/throws-when-string-length-is-odd.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/name.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/results.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/illegal-characters.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/target-size.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/string-coercion.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/length.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/trailing-garbage-empty.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/option-coercion.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/alphabet.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/detached-buffer.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/name.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/results.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/illegal-characters.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/whitespace.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/string-coercion.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/receiver-not-uint8array.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/length.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/detached-buffer.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/name.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/results.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/length.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/option-coercion.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/alphabet.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/name.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/results.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/illegal-characters.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/whitespace.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/string-coercion.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/ignores-receiver.js (previously Ignored)
test/staging/Uint8Array/fromBase64/invalid-options.js (previously Ignored)
test/staging/sm/TypedArray/prototype-constructor-identity.js (previously Failed)

@magic-akari
Copy link
Copy Markdown
Contributor Author

Note

data-encoding gives us first-error boundaries (read/written/position), but stop-before-partial requires the last spec-acceptable full-chunk boundary, which is not always the same.

For setFromBase64, the spec requires early return once maxLength is reached (and trailing garbage after that must be ignored), while data-encoding has no output-cap-aware early-stop decode API.

Also, simple 4-character truncation is insufficient: whitespace affects read, padding legality depends on exact = placement, and strict/loose/stop-before-partial have different tail rules.

At the moment, we do not support stop-before-partial semantics.

If we decide to go with data-encoding (instead of simdutf), one realistic long-term path might involve forking data-encoding and add the API we need.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 2.62097% with 483 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.30%. Comparing base (6ddc2b4) to head (d5ae194).
⚠️ Report is 947 commits behind head on main.

Files with missing lines Patch % Lines
...e/engine/src/builtins/typed_array/builtin_uint8.rs 0.00% 332 Missing ⚠️
core/engine/src/builtins/typed_array/base64.rs 0.00% 125 Missing ⚠️
core/engine/src/builtins/typed_array/hex.rs 0.00% 26 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4630       +/-   ##
===========================================
+ Coverage   47.24%   59.30%   +12.06%     
===========================================
  Files         476      592      +116     
  Lines       46892    64160    +17268     
===========================================
+ Hits        22154    38050    +15896     
- Misses      24738    26110     +1372     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@magic-akari magic-akari force-pushed the feat/arraybuffer-base64-data-encoding branch from f2d29eb to 2ff9f60 Compare February 16, 2026 15:15
@magic-akari magic-akari marked this pull request as ready for review February 16, 2026 15:29
@nekevss nekevss requested review from a team and jedel1043 February 17, 2026 22:18
@nekevss nekevss added the A-Enhancement New feature or request label Feb 17, 2026
@jedel1043
Copy link
Copy Markdown
Member

jedel1043 commented Feb 20, 2026

Thank you, this looks nice. Just to be clear, what would be the full list of missing things from data-encoding's side?

I ask this because I plan to open a discussion with the data-encoding maintainers to know if they're interested in adding those features.

@magic-akari
Copy link
Copy Markdown
Contributor Author

The main missing feature is LastChunkHandlingOptions. Currently, data-encoding only offers two padding modes: None and Some, both of which enforce strict padding validation.

However, the ECMA-262 spec requires three distinct modes: strict("="), loose("="), and StopBeforePartial.

Our current implementation works around this limitation by checking for the presence of = in the input beforehand to toggle between Some("=") and None. Having native loose("=") support in data-encoding would eliminate this hack.

There's another subtle issue: ECMA-262 disallows concatenated padded inputs, but data-encoding with padding enabled accepts them. A proper loose mode shouldn't just be a binary switch—it needs to reject concatenated padded inputs to match the spec.

As for StopBeforePartial, we should probably pull some test cases from Test262 to ensure our implementation is correct.

@jedel1043
Copy link
Copy Markdown
Member

Thank! I'll open an issue with them to see if they're open to implementing the features.

@jedel1043
Copy link
Copy Markdown
Member

ia0/data-encoding#152 (comment)
Things went well! The maintainer plans to implement a "disallow concatenated padded inputs" mode, and they also provided a wrapper function that supports "stop-before-partial", with a plan to integrate that into data-encoding@v3.

Since they seem collaborative and responsive, I'd say we should merge this and try to push for the addition of the features we need.

@jedel1043 jedel1043 added Waiting On Author Waiting on PR changes from the author and removed waiting-on-review labels Mar 13, 2026
@ia0 ia0 mentioned this pull request Mar 17, 2026
24 tasks
@ia0
Copy link
Copy Markdown

ia0 commented Mar 17, 2026

Actually I missed those requirements from #4630 (comment):

For setFromBase64, the spec requires early return once maxLength is reached (and trailing garbage after that must be ignored), while data-encoding has no output-cap-aware early-stop decode API.

This is something I've never heard of yet. I've added it to the wish list for v3.

Also, simple 4-character truncation is insufficient: whitespace affects read, padding legality depends on exact = placement, and strict/loose/stop-before-partial have different tail rules.

I missed the fact you don't use BASE64 and BASE64URL but have custom encodings that ignore whitespaces. The work-around I gave only works without ignored characters.

Is the maxLength from above a hard constraint? If yes, I can try to see if there's a work-around that would fix both issues: short output and ignored characters. If there's a solution, then I can probably add it to v2 because it would be generic enough.

@magic-akari
Copy link
Copy Markdown
Contributor Author

Is the maxLength from above a hard constraint?

Yes, the setFromBase64 API performs in-place decoding into an existing byte container rather than returning a new instance. Consequently, it must account for complex boundary conditions involving the discrepancy between the decoded data volume and the destination's capacity, including error-handling protocols for overflow. These requirements are meticulously documented in ECMA-262 and validated through Test262 conformance tests.

@ia0
Copy link
Copy Markdown

ia0 commented Mar 17, 2026

Thanks! I guess this is the specification: https://tc39.es/ecma262/#sec-frombase64. I'll take a look when I get time.

@ia0
Copy link
Copy Markdown

ia0 commented Mar 29, 2026

Sorry for the long delay. That's the first week-end where I could get a chance to look at this.

You can find here an example implementation of FromBase64 from ECMA-262 using data-encoding. If my interpretation of the specification is correct, then the implementation should be correct (I fuzzed it for 10 hours).

There's a few things to note:

  • The implementation is not optimized to handle cases where max_length is much smaller than 6 * input.len() / 8. This could be improved with an iterative approach if performance matters in those cases (this would probably still require output.len() to be greater than max_length by at least 3 bytes though).
  • Lines 54 to 77 are not ECMA-262 specific but a way to work around the fact that data-encoding doesn't provide a way to decode up to a given output length. I'm not sure how to do this without sacrificing performance yet. I'll continue thinking about it though since it looks like a legitimate use-case and I'd like to have it in v3.
  • Lines 79 to 151 are really the ECMA-262 specific logic and I don't think data-encoding can provide much more help there. But happy to look into ideas you have.

If using that implementation on your side is satisfactory, then I can create a PR to update data-encoding with the change helping performance and readability of the implementation.

@magic-akari
Copy link
Copy Markdown
Contributor Author

magic-akari commented Apr 8, 2026

If using that implementation on your side is satisfactory, then I can create a PR to update data-encoding with the change helping performance and readability of the implementation.

Thank you for the update. I have completed local testing of the ecma262 branch and confirmed that it passes all test262 test cases. I believe the implementation is now ready to move forward.

Regarding the ecma262 module - specifically the decode and decode_mut implementations - I have a question about its long-term maintenance. You previously mentioned that this is an "example implementation," but given that it has already reached a high degree of usability, will it be officially published and maintained within data-encoding? Or is the expectation for boa to implement and maintain its own version?

@ia0
Copy link
Copy Markdown

ia0 commented Apr 9, 2026

Yes, my idea was for boa to copy and maintain the implementation locally. An alternative would be for me (or us) to publish and maintain (or co-maintain) a base64-ecma262 crate with that implementation. There shouldn't be much maintenance anyway since the standard shouldn't change often. The implementation might change once data-encoding v3 is stable (in a few years) to benefit from possible performance and code-size improvements, and also simplifying the implementation from new features. I can take the maintenance responsibility for this change, that's not a problem. It's just that I don't want code specific to a particular standard in data-encoding, which should remain a generic crate (fast to compile with no dependencies).

@magic-akari magic-akari force-pushed the feat/arraybuffer-base64-data-encoding branch from 2ff9f60 to d5ae194 Compare April 9, 2026 15:46
@github-actions github-actions bot added C-Dependencies Pull requests that update a dependency file C-Tests Issues and PRs related to the tests. C-Builtins PRs and Issues related to builtins/intrinsics labels Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Test262 conformance changes

Test result main count PR count difference
Total 53,125 53,125 0
Passed 51,049 51,119 +70
Ignored 1,482 1,413 -69
Failed 594 593 -1
Panics 0 0 0
Conformance 96.09% 96.22% +0.13%
Fixed tests (70):
test/staging/sm/TypedArray/prototype-constructor-identity.js (previously Failed)
test/staging/Uint8Array/fromBase64/invalid-options.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/last-chunk-invalid.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/last-chunk-handling.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/illegal-characters.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/option-coercion.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/string-coercion.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/ignores-receiver.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/alphabet.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/results.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/whitespace.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/length.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/name.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/fromBase64/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/last-chunk-handling.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/detached-buffer.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/illegal-characters.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/option-coercion.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/string-coercion.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/alphabet.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/trailing-garbage-empty.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/results.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/whitespace.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/length.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/writes-up-to-error.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/name.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/trailing-garbage.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/subarray.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/target-size.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromBase64/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/detached-buffer.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/results.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/receiver-not-uint8array.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/length.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/name.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toHex/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/detached-buffer.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/illegal-characters.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/string-coercion.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/throws-when-string-length-is-odd.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/results.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/length.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/writes-up-to-error.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/name.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/subarray.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/target-size.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/setFromHex/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/detached-buffer.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/option-coercion.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/omit-padding.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/alphabet.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/results.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/receiver-not-uint8array.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/length.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/name.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/prototype/toBase64/nonconstructor.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/odd-length-input.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/illegal-characters.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/string-coercion.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/ignores-receiver.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/results.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/length.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/name.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/descriptor.js (previously Ignored)
test/built-ins/Uint8Array/fromHex/nonconstructor.js (previously Ignored)

Tested main commit: d6d76d86e9e18fca07f318f1db434cc8abffaf14
Tested PR commit: d5ae1949e1a8b221debdac255476975a3e8ac4f0
Compare commits: d6d76d8...d5ae194

@magic-akari
Copy link
Copy Markdown
Contributor Author

I have updated the implementation to use the ecma262 branch of data-encoding as a dependency, which is expected to pass all tests. I will update the dependency again once a new version of data-encoding is officially released.

Regarding the maintenance strategy for the ECMA-262 logic—whether it resides in a separate crate or within Boa—I will leave the final decision to @jedel1043, as I am contributing to this project as an external contributor.

@magic-akari
Copy link
Copy Markdown
Contributor Author

There's a few things to note:

  • The implementation is not optimized to handle cases where max_length is much smaller than 6 * input.len() / 8. This could be improved with an iterative approach if performance matters in those cases (this would probably still require output.len() to be greater than max_length by at least 3 bytes though).
  • Lines 54 to 77 are not ECMA-262 specific but a way to work around the fact that data-encoding doesn't provide a way to decode up to a given output length. I'm not sure how to do this without sacrificing performance yet. I'll continue thinking about it though since it looks like a legitimate use-case and I'd like to have it in v3.
  • Lines 79 to 151 are really the ECMA-262 specific logic and I don't think data-encoding can provide much more help there. But happy to look into ideas you have.

Thank you for these technical notes.

@jedel1043
Copy link
Copy Markdown
Member

Regarding the maintenance strategy for the ECMA-262 logic—whether it resides in a separate crate or within Boa—I will leave the final decision to @jedel1043, as I am contributing to this project as an external contributor.

Sounds fine to have a small crate for it (might be useful for other Rust engines such as Nova), but we can leave that for the future. In the meantime let's just bundle the code in boa_engine

// Backtrack one symbol.
read -= 1;
extra_input -= 1;
debug_assert!(base.interpret_byte(input[read]).is_symbol());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged ia0/data-encoding#154 and decided is_symbol() should return an Option instead of a bool (to avoid losing possibly useful information). So you'll need the following change once the Cargo.toml points to the default branch, and once I'll release the change (in about 2 weeks to let it cool down).

Suggested change
debug_assert!(base.interpret_byte(input[read]).is_symbol());
debug_assert!(base.interpret_byte(input[read]).is_symbol().is_some());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Enhancement New feature or request C-Builtins PRs and Issues related to builtins/intrinsics C-Dependencies Pull requests that update a dependency file C-Tests Issues and PRs related to the tests. Waiting On Author Waiting on PR changes from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Uint8Array base64/hex encoding methods (fromBase64, fromHex, toBase64, toHex, setFromBase64, setFromHex)

4 participants