Compile-time polymorphic string literals for modern C++
utf42 is a small, header-only C++ utility that allows you to define a string
once and obtain it as a std::basic_string_view of any character type
(char, wchar_t, char8_t, char16_t, char32_t) entirely at compile time.
It achieves this by leveraging the compilerβs built-in handling of Unicode
string literal prefixes ("", L"", u8"", u"", U"") and selecting the
appropriate encoding using consteval dispatch.
The result is zero runtime overhead, no heap allocation, and no runtime Unicode transcoding.
- β¨ Features
- β Motivation
- π¦ Requirements
- π Usage
β οΈ Important Limitations- π§ Design Philosophy
- π Inclusion in Your Project
- π License
- β Single source of truth for string literals
- β Zero runtime cost
- β No heap allocation
- β No UTF transcoding at runtime
- β
Full support for
\uXXXXand\UXXXXXXXX - β Works with custom character typedefs
- β Header-only
- β C++20 compliant
- β C++17 compliant (not all features)
- β
C++14 compliant (not all features, uses custom wrapper to replace
std::basic_string_view) - β
C++11 compliant (not all features, uses custom wrapper to replace
std::basic_string_view)
C++ does not allow templating string literal prefixes:
make_str<char16_t>("hello"); // β impossibleThis leads to duplicated literals like:
"hello"
u"hello"
U"hello"utf42 solves this by:
- Letting the compiler generate all encoded variants of a literal
- Selecting the correct one at compile time based on the requested character type
- C++11
- Uses SFINAE to implement the templates
- Some features of the custom string view wrapper can not be made constexpr
- C++14
- Uses SFINAE to implement the templates
- C++17 or later
- if constexpr
- C++20 or later (if available defines extra features):
- char8_t
- consteval
- Concepts
- Recommended UTF-8 encoded source files
- A compiler with proper Unicode literal support (GCC, Clang, MSVC)
Approach: one time use string literal.
We use the macro make_poly_enc(CharacterType, lit)
to process the string literal and reencode on
compile time to the desired character type.
This proves particularly usefull when the
character type is a template parameter.
#include <utf42/utf42.h>
// Typedef of char32_t
using char42_t = char32_t;
// Different encoding views
constexpr std::basic_string_view<char> strv_a = make_poly_enc(char, "Hello World \U0001F600!");
constexpr std::basic_string_view<char8_t> strv_8 = make_poly_enc(char8_t, "Hello World \U0001F600!");
constexpr std::basic_string_view<char16_t> strv_16 = make_poly_enc(char16_t, "Hello World \U0001F600!");
constexpr std::basic_string_view<char32_t> strv_32 = make_poly_enc(char32_t, "Hello World \U0001F600!");
constexpr std::basic_string_view<char42_t> strv_42 = make_poly_enc(char42_t, "Hello World \U0001F600!");All variables above refer to the same logical string, encoded differently by the compiler.
Converting back to UTF-8 (for display)
std::string str_a (strv_a);
std::string str_8 (char8_to_char(strv_8));
std::string str_16 (utf8::utf16to8(strv_16));
std::string str_32 (utf8::utf32to8(strv_32));
std::string str_42 (utf8::utf32to8(strv_42));#include <utf8cpp/utf8.h>) and are not part of utf42.
See utfcpp documentation for more information.
Approach: multiple time use string literal.
We use an instance of utf42::poly_enc to store all
variants and recover later the desired variant.
The macro cons_poly_enc(lit) constructs the object
from a single string literal.
#include <utf42/utf42.h>
// Typedef of char32_t
using char42_t = char32_t;
// Create all different encoding string views
constexpr utf42::poly_enc oText = cons_poly_enc("Hello World \U0001F600!");The variable above refer to the same logical string, encoded differently by the compiler.
Converting back to UTF-8 (for display)
// Re-encode everything to utf-8
std::string str_a(oText.TXT_CHAR);
std::string str_8(char8_to_char(oText.TXT_CHAR_8));
std::string str_16(utf8::utf16to8(oText.TXT_CHAR_16));
std::string str_32(utf8::utf32to8(oText.TXT_CHAR_32));
std::string str_42(utf8::utf32to8(oText.visit<char42_t>()));It is possible to use the function template template<CharType char_t> constexpr std::basic_string_view<char_t> utf42::poly_enc::visit() const noexcept
to recover the data in template scenarios.
#include <utf8cpp/utf8.h>) and are not part of utf42.
See utfcpp documentation for more information.
std::cout << "Original: " << str_a << '\n';
std::cout << "utf-8: " << str_8 << '\n';
std::cout << "utf-16: " << str_16 << '\n';
std::cout << "utf-32: " << str_32 << '\n';
std::cout << "utf-42: " << str_42 << '\n';Display on the terminal:
Original: Hello World π!
utf-8: Hello World π!
utf-16: Hello World π!
utf-32: Hello World π!
utf-42: Hello World π!Your terminal MUST be configured to use UTF-8 for this particular example to print the output correctly. The user may re-encode the text appropriately or change the setup of the terminal.
If the terminal encoding is not UTF-8:
- Unicode characters may appear as ?, β‘, or mojibake
- This is not a bug in utf42
Common setups:
- Linux / macOS: UTF-8 by default
- Windows:
- Use Windows Terminal, or
- Run:
chcp 65001
The macro make_poly_enc must be used with string literals only.
Any other input will result in undefined behavior.
- No runtime strings
- No dynamic encoding conversion
- No grapheme-cluster or text-shaping logic
- This library operates strictly at the code-unit level.
Such as:
make_poly_enc(char16_t, "OK"); // β
valid
make_poly_enc(char16_t, someVar); // β undefined behavior
- Header only library
- Let the compiler handle Unicode parsing
- Prefer compile-time work over runtime work
- Avoid unnecessary abstraction or dependencies
- Keep the API minimal, explicit, and fast
Ideal use cases include:
- Logging systems
- Localization keys
- Cross-platform APIs
- Performance-critical code
- Compile-time configuration strings
This section explains how to include this library in your project.
Simply copy the header utf42.h into your project. No additional setup or configuration is required.
You need the FetchContent package:
include(FetchContent)For more details, see the FetchContent documentation.
Then, copy the following code into your CMake file to fetch the utf42 library:
# Fetch the utf42 library
FetchContent_Declare(
utf42
GIT_REPOSITORY https://github.com/dante19031999/utf-42
GIT_TAG master
)
FetchContent_MakeAvailable(utf42)After fetching the library, link it with your target. Use the following line in your CMake configuration:
target_link_libraries(mylib VISIBILITY utf42::utf42)For more details, see the target_link_libraries documentation.
MIT License
Copyright (c) 2025 Dante DomΓ©nech MartΓnez
