Skip to content

Boost.locale makes std::regex not match anything #249

@Lord-Kamina

Description

@Lord-Kamina

I had initially posted a comment in #35, but maybe it deserves its own issue instead.
I think it's essentially the same problem, except I'm on macOS 13,

$ clang++ -v
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: x86_64-apple-darwin22.6.0.

I'm using boost 1.86.0, built against ICU 74.2

I had seen this behavior before, and have never found a real solution. I now stumbled upon it again on a project. I spent about two days trying to tune my regex, thinking I must have made a mistake. Eventually I began simplifying it and simplifying it, without it resolving.

Eventually, I decided to make a minimal example to test it; so I have following code:

#include <boost/locale.hpp>
#include <iostream>
#include <locale>
#include <regex>
#include <string>

int main() {
	boost::locale::generator locGen;
	const std::locale loc = locGen("en_US.UTF-8");
// 	std::locale::global(loc);  
	auto pattern = std::regex(R"(^(?:\s)*([_[:alnum:].-]+)\s*=\s*([^;#\n\r]+)*)");
// 	pattern.imbue(loc);
	const std::string text{"  pozo = mani"};
	std::smatch result;
	std::regex_search(text, result, pattern);
	std::cout << "ready: " << result.ready() << ", size: " << result.size() << std::endl;
	for (size_t i=0; i < result.size(); i++) {
		std::cout << "match[" <<i<<"]: " << result[i] <<std::endl;
	}
	return 0;
}

Which outputs

$ clang++ -o regex_test regex_test.cpp -std=c++17 -I/opt/local/include/ -lboost_locale-mt -lboost_system-mt -L/opt/local/lib && ./regex_test
ready: 1, size: 3
match[0]:   pozo = mani
match[1]: pozo
match[2]: mani

If I uncomment the std::locale::global line (with or without the pattern.imbue), this happens instead:

clang++ -o regex_test regex_test.cpp -std=c++17 -I/opt/local/include/ -lboost_locale-mt -lboost_system-mt -L/opt/local/lib && ./regex_test
ready: 1, size: 0

I tried changing facets gradually, OR'ing them one by one and it always worked until I added std::locale::collate. From that point, removing all the others and keeping just std::locale::locate, still makes the regex not work.

#include <boost/locale.hpp>
#include <iostream>
#include <locale>
#include <regex>
#include <string>

int main() {
	boost::locale::generator locGen;
	const std::locale loc = locGen("en_US.UTF-8");
	std::locale testLoc = std::locale(std::locale::classic(), loc, std::locale::collate);
	std::locale::global(testLoc);
	auto pattern = std::regex(R"(^(?:\s)*([_[:alnum:].-]+)\s*=\s*([^;#\n\r]+)*)");
// 	pattern.imbue();
	const std::string text{"  pozo = mani"};
	std::smatch result;
	std::regex_search(text, result, pattern);
	std::cout << "ready: " << result.ready() << ", size: " << result.size() << std::endl;
	for (size_t i=0; i < result.size(); i++) {
		std::cout << "match[" <<i<<"]: " << result[i] <<std::endl;
	}
	return 0;
}

That already doesn't work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions