Skip to content

Fix UTF-16 string length computations #130

Open
@CBenoit

Description

@CBenoit

Some code in IronRDP is incorrectly computing the size of UTF-16 strings.

Example:

// This is not the right way to compute the number of bytes for unicode strings encoded in UTF-16.
// This is a time bomb: it will returns the correct result some times (e.g.: when the string is valid ASCII),
// but not always.
fn utf16_len(utf8_str: &str) -> usize {
    utf8_str.len() * 2
}

Both UTF-8 and UTF-16 are using a variable-length encoding and code points may be encoded using multiple code units. The thing is, UTF-16 uses one or two 16-bit code units and UTF-8 uses between one and four 8-bit code units. It’s really not always the case that a code point in UTF-16 is twice as big as the same code point in UTF-8.

This kind of erroneous code is present at multiple places. One such instance is ironrdp_pdu::rdp::client_info::string_len.

Instead, something like that must be used:

utf8_str.encode_utf16().count() * 2 // add 2 if we need to account for a null terminator (0x0000)

Refer to ironrdp_pdu::pcb module for a correct implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-coreArea: Core tierA-technical-debtArea: Internal cleanup workP-mediumMedium prioritybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions