Skip to content

Conversation

@steiltre
Copy link
Collaborator

@steiltre steiltre commented Aug 1, 2025

This PR adds a template parameter for the string type to use when reading individual lines within the ygm::io::line_parser and its derivatives.

This change allows UTF-8 files with non-ASCII characters to read to a type such as std::u32string that can appropriately handle an extended character set. This has the advantage of correctly reporting the size of strings and operator[] returning the correct characters. It has the disadvantages of requiring 4 bytes of storage for all characters (including ASCII characters that could be stored with 1 byte) and needing a user to convert the used strings before printing or writing to file (using std::codecvt or whatever replaces it since its deprecation).

Without this change, the ygm::io::line_parser is able to read all legal UTF-8 characters (from my testing using files here). Lines with non-ASCII characters read into a std::string treat the line as a collection of single-byte characters giving incorrect sizes and accessing of characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant