Skip to content

TextDecoder: ERR_ENCODING_INVALID_ENCODED_DATA on very long array buffer #47645

Open
@martian17

Description

@martian17

Version

v18.14.1

Platform

Linux 5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

When I try to decode a long utf-16le encoded buffer, ERR_ENCODING_INVALID_ENCODED_DATA is thrown instead of ERR_STRING_TOO_LONG.

new TextDecoder("utf-16le").decode(new Uint16Array(2**27).fill(48))
// Uncaught TypeError: The encoded data was not valid for encoding utf-16le
//     at TextDecoder.decode (node:internal/encoding:448:14) {
//   code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
// }

The default encoding version seems to work correctly, and throws an appropriate error

new TextDecoder().decode(new Uint8Array(2**29).fill(48))
// Uncaught Error: Cannot create a string longer than 0x1fffffe8 characters
//     at TextDecoder.decode (node:internal/encoding:433:16) {
//   code: 'ERR_STRING_TOO_LONG'
// }

Another thing that I realized is that TextDecoder() seems to be capable of consuming an array buffer twice as long as TextDecoder("utf-16le") without throwing error, and produce a string that's 4 times as long.

How often does it reproduce? Is there a required condition?

Confirmed this bug in both normal file execution and node.js repl

What is the expected behavior? Why is that the expected behavior?

new TextDecoder("utf-16le") should be able to create a string up to 0x1fffffe8 characters.
It should throw ERR_STRING_TOO_LONG when this length is exceeded.

What do you see instead?

ERR_ENCODING_INVALID_ENCODED_DATA is thrown when the input Uint16Array length is 2**27

Uncaught TypeError: The encoded data was not valid for encoding utf-16le
    at TextDecoder.decode (node:internal/encoding:448:14) {
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

Additional information

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    utilIssues and PRs related to the built-in util module.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions