Skip to content

utf8proc_map_custom custom_data should be const void * (not void *) and could corrupt memory #249

Open
@liquidaty

Description

@liquidaty

Hi,

utf8proc_map_custom takes a void *custom_data parameter. However, if the custom_data is modified by the custom function, utf8proc_map_custom might not work as expected-- and possibly corrupt memory-- because utf8proc_map_custom calls utf8proc_decompose_custom twice, and only the second time's results are kept, but there is no way to "reset" the custom data to its initial state before the second call.

As an example, imagine I use a custom transformation to replace the first character with 'A', but keep the rest of the string. Using utf8proc_map_custom would seem easy enough:

struct ctx {
  char start_of_string
};

static utf8proc_int32_t replaceFirstCharWithA(utf8proc_int32_t codepoint, struct ctx *ctx) {
  if(ctx->start_of_string)
    codepoint = 'A';
  ctx->start_of_string = 0;
  return codepoint;
}

void test() {
   struct ctx ctx;
   ctx.start_of_string = 1;
   utf8proc_map_custom(..., replaceFirstCharWithA, &ctx);
}

However, this will not actually work because utf8proc_map_custom only keeps the results of its second utf8proc_decompose_custom call, at which time ctx->start_of_string will already be set to 0.

I believe this could also lead to memory corruption if the above example was run with an input string that had a multi-byte first character (in which case the first run of utf8proc_decompose_custom would receive a length assuming a single-byte first char, but the second run would write a multi-byte first char).

It would be nice if there was a way to fix this issue inside utf8proc_map_custom without changing its signature, but that does not seem possible. But at a minimum, it would seem safer and more accurate if custom_data was a const void * instead of a void * (and arguably, a bug in the latter form as it currently stands)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions