Skip to content

UTF-8 invalid characters are not always ignored when dumping with error_handler_t::ignore #4552

Open
@gentooise

Description

@gentooise

Description

According to this: https://json.nlohmann.me/api/basic_json/dump/#parameters , when passing error_handler_t::ignore to dump() function, invalid UTF-8 characters should be ignored and copied as-is into the final string.

However, I'm debugging the following minimal code:

    std::string test = "test\334\005";
    nlohmann::json node{};
    node["test"] = test;
    auto test_dump = node.dump(-1, ' ', false, nlohmann::json::error_handler_t::ignore);

and the final test_dump string contains test\005 (byte \334 is gone).

image

Is this expected? Am I missing something?

Reproduction steps

Just try to run/debug the following:

    std::string test = "test\334\005";
    nlohmann::json node{};
    node["test"] = test;
    auto test_dump = node.dump(-1, ' ', false, nlohmann::json::error_handler_t::ignore);

Expected vs. actual results

Actual: test_dump contains test\005
Expected: test_dump contains test\334\005

Minimal code example

std::string test = "test\334\005";
nlohmann::json node{};
node["test"] = test;
auto test_dump = node.dump(-1, ' ', false, nlohmann::json::error_handler_t::ignore);

Error messages

No response

Compiler and operating system

gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924

Library version

3.11.2

Validation

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions