Skip to content

fix: Encode strings as UTF-8, take 2#471

Merged
bkeryan merged 15 commits intomasterfrom
users/bkeryan/utf-8-take-2
Aug 11, 2025
Merged

fix: Encode strings as UTF-8, take 2#471
bkeryan merged 15 commits intomasterfrom
users/bkeryan/utf-8-take-2

Conversation

@bkeryan
Copy link
Copy Markdown
Collaborator

@bkeryan bkeryan commented Aug 6, 2025

What does this Pull Request accomplish?

Split string/bytes handling:

  • Add data_utf8Strings feature toggle.
  • Add SetLVBytes and GetLVBytes. These do not do any string conversion/verification.
  • Update SetLVString and GetLVString to convert to/from UTF-8 using ConvertSystemStringToUTF8 and ConvertUTF8StringToSystem.
  • Change SetLVBytes and SetLVString to pass the string as a std::string_view.
  • Change SetLVRTModulePath to pass std::string by const reference.
  • Add LVBytesMessageValue and LVRepeatedBytesMessageValue, which use bytes-specific functionality like WriteBytes, BytesSize, WireFormatLite::TYPE_BYTES, etc.
  • Update ParseBytes methods to use the new classes.
  • Update LVMessageMetadataType::BytesValue cases to use bytes-specific functionality.
  • Update serialized descriptors to use bytes-specific functionality.

Verify string encoding:

  • Add data_verifyStringEncoding feature toggle.
  • Add <string_utils.h> containing VerifyAsciiString and VerifyUtf8String functions.
    • These log and return a bool like the protobuf WireFormatLite::VerifyUtf8String function.
    • If the data_verifyStringEncoding feature toggle is disabled, these functions always succeed.
  • Add SetLVAsciiString and GetLVAsciiString. These call VerifyAsciiString and throw an exception if it returns false.
  • Update any_support.cc and proto_parser.cc to call SetLVAsciiString, GetLVAsciiString, or VerifyAsciiString
  • Update ParseString methods to call VerifyUtf8String and throw std::runtime_error if it fails. This logs a message like:

    [wire_format_lite.cc:603] String field 'name' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
    TranslateException translates the C++ exception to LabVIEW error -1.

  • Update string Serialize methods to call VerifyUtf8String. This logs but does not return an error.

Add four test cases:

  • Test_HelloWorld_Internationalization.vi tests the HelloWorld client+server with internationalized strings. Unfortunately, this test still passes without the fix because both the client and server use the same incorrect string encoding.
  • Test_HelloWorld_Utf8Encoding.vi packs a helloworld.HelloRequest to Any, searches for the UTF-8 encoded string, and unpacks it and compares the result. This test fails without the fix.
  • Test_HelloWorld_InvalidAsciiEncoding.vi packs an Any using a non-ASCII message name and verifies that it fails.- Test_HelloWorld_InvalidUtf8Encoding.vi packs an Any, replaces an ASCII sequence with Latin-1 (which is invalid UTF-8) and verifies that unpacking it fails.

Why should this Pull Request be merged?

Closes #461

What testing has been done?

Default feature toggles:

data_verifyStringEncoding = false:

  • Ran tests/run_tests.py with LV 2019 x86 and release DLL.
  • Test_HelloWorld_InvalidUtf8Encoding.vi fails.
  • Test_HelloWorld_InvalidAsciiEncoding.vi still passes because the invalid ASCII name is not found in the map.

data_utf8Strings = false:

  • Ran tests/run_tests.py with LV 2019 x86 and release DLL.
  • Test_HelloWorld_InvalidUtf8Encoding.vi fails.
  • Test_HelloWorld_Utf8Encoding.vi fails.

data_utf8Strings = false and data_verifyStringEncoding = false:

  • Ran tests/New_ATS/pylib/run_tests.py with LV 2019 x86 and release DLL.

@bkeryan bkeryan requested a review from jasonmreding August 6, 2025 00:22
@jasonmreding
Copy link
Copy Markdown
Collaborator

What testing has been done?

data_EfficientMessageCopy = true:

TODO

FYI, the data_EfficientMessageCopy toggle has been broken since #433.

@bkeryan bkeryan mentioned this pull request Aug 6, 2025
@bkeryan bkeryan force-pushed the users/bkeryan/utf-8-take-2 branch from 44778c6 to 202ebe3 Compare August 6, 2025 21:07
@bkeryan bkeryan requested a review from jasonmreding August 8, 2025 21:06
@bkeryan
Copy link
Copy Markdown
Collaborator Author

bkeryan commented Aug 11, 2025

FYI @jasonmreding Some last-minute changes compared to what you reviewed:

  • Use a C++ exception in ParseBytes instead of returning nullptr. This reports the error without triggering LV's unhandled exception handler.
  • Remove IsDebugDLL.vi and change the expected error code to -1 in Test_HelloWorld_InvalidUtf8Encoding.vi

@bkeryan bkeryan requested a review from jasonmreding August 11, 2025 21:16
@bkeryan bkeryan merged commit 068f31c into master Aug 11, 2025
5 checks passed
@bkeryan bkeryan deleted the users/bkeryan/utf-8-take-2 branch August 11, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

grpc-labview serializes strings using the ANSI code page, not UTF-8

2 participants