Reading to multiple string types in ygm::io::line_parser #369

steiltre · 2025-08-01T22:08:36Z

This PR adds a template parameter for the string type to use when reading individual lines within the ygm::io::line_parser and its derivatives.

This change allows UTF-8 files with non-ASCII characters to read to a type such as std::u32string that can appropriately handle an extended character set. This has the advantage of correctly reporting the size of strings and operator[] returning the correct characters. It has the disadvantages of requiring 4 bytes of storage for all characters (including ASCII characters that could be stored with 1 byte) and needing a user to convert the used strings before printing or writing to file (using std::codecvt or whatever replaces it since its deprecation).

Without this change, the ygm::io::line_parser is able to read all legal UTF-8 characters (from my testing using files here). Lines with non-ASCII characters read into a std::string treat the line as a collection of single-byte characters giving incorrect sizes and accessing of characters.

…es within files.

…on Mac

steiltre added 6 commits June 18, 2025 08:09

Adds to test_line_parser to check what happens when reading utf-8 input

81768ec

Merge branch 'v0.9-dev' of github.com:LLNL/ygm into feature/utf-8

30ce691

Templates ygm::io::line_parser on a string type to use of reading lin…

3a49f74

…es within files.

Commenting out imbue for locale in line_parser

2cb27f5

Adds missing locale header to line_parser.hpp

b73eaad

Putting guard around test of reading 4-byte strings to avoid running …

510555c

…on Mac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reading to multiple string types in ygm::io::line_parser #369

Reading to multiple string types in ygm::io::line_parser #369

Uh oh!

steiltre commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Reading to multiple string types in ygm::io::line_parser #369

Are you sure you want to change the base?

Reading to multiple string types in ygm::io::line_parser #369

Uh oh!

Conversation

steiltre commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant