Fix UTF8FirstLetterNumBytes to handle malformed UTF-8 correctly by kodareef5 · Pull Request #1379 · bufbuild/protoc-gen-validate

kodareef5 · 2026-03-22T22:41:11Z

UTF8FirstLetterNumBytes in validate/validate.h returns the byte count from OneCharLen without validating that the expected continuation bytes actually follow the leader byte. Malformed UTF-8 causes Utf8Len to undercount characters by 2-4x, bypassing string length validation constraints (min_len, max_len, len).

Example: 20 bytes of \xC0 (bare 2-byte leaders, no continuations) produces Utf8Len=10 instead of 20. A field with max_len = 10 incorrectly accepts this input.

This is exploitable when C++ protobuf deserialization doesn't enforce UTF-8 validity (the default), allowing malformed strings to reach pgv validation.

Fix:

Clamp consumed bytes to remaining buffer length (prevents reading past end)
Validate continuation bytes have the 10xxxxxx pattern
Return 1 for any invalid byte sequence (count as single character)

Valid UTF-8 counting is unchanged: "hello"=5, "café"=4, "你好"=2, "😀😀"=2.

UTF8FirstLetterNumBytes returns the byte count from OneCharLen without validating that the expected continuation bytes actually follow the leader byte. Malformed UTF-8 (e.g., bare leader bytes without continuations) causes Utf8Len to undercount characters by 2-4x, bypassing string length validation constraints. For example, 20 bytes of 0xC0 (invalid 2-byte leaders) produces Utf8Len=10 instead of 20, allowing a max_len=10 constraint to accept 20 bytes of data. Fix: - Clamp consumed bytes to remaining buffer length - Validate continuation bytes have the 10xxxxxx pattern - Return 1 for any invalid byte (count as single character) Valid UTF-8 counting is unchanged.

CLAassistant · 2026-03-22T22:41:20Z

All committers have signed the CLA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix UTF8FirstLetterNumBytes to handle malformed UTF-8 correctly#1379

Fix UTF8FirstLetterNumBytes to handle malformed UTF-8 correctly#1379
kodareef5 wants to merge 1 commit intobufbuild:mainfrom
kodareef5:fix-utf8len-malformed

kodareef5 commented Mar 22, 2026

Uh oh!

CLAassistant commented Mar 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kodareef5 commented Mar 22, 2026

Uh oh!

CLAassistant commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Mar 22, 2026 •

edited

Loading