Description
Given the following problem
{
std::function<void()> f;
std::atomic_bool start = false;
std::atomic_bool done = false;
std::jthread jt{[&] {
start.wait(false);
f();
done = true;
done.notify_all();
}};
f = [&] {
try {
jt.join();
assert(false);
} catch (const std::system_error& err) {
assert(err.code() == std::errc::resource_deadlock_would_occur);
}
};
start = true;
start.notify_all();
done.wait(false);
}
The jt.join()
would throw and the exception would be caught. Later at the end of the scope, the destructor of jthread
would call join
the second time. This is valid code but TSAN complains
ThreadSanitizer: CHECK failed: sanitizer_thread_registry.cpp:348 "((t)) != (0)" (0x0, 0x0) (tid=3411214)
#0 __tsan::CheckUnwind() <null> (t.tmp.exe+0xcbd1b) (BuildId: d7fd9285e8961410872ba6334aecc0115d0ff192)
#1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) <null> (t.tmp.exe+0x45042) (BuildId: d7fd9285e8961410872ba6334aecc0115d0ff192)
#2 __sanitizer::ThreadRegistry::ConsumeThreadUserId(unsigned long) <null> (t.tmp.exe+0x43e94) (BuildId: d7fd9285e8961410872ba6334aecc0115d0ff192)
#3 pthread_join <null> (t.tmp.exe+0x604bb) (BuildId: d7fd9285e8961410872ba6334aecc0115d0ff192)
#4 std::__1::__libcpp_thread_join[abi:v170000](unsigned long*) /home/libcxx-builder/.buildkite-agent/builds/google-libcxx-builder-69f521df8409-1/llvm-project/libcxx-ci/build/generic-tsan/include/c++/v1/__threading_support:398:10 (libc++.so.1+0x73358) (BuildId: 71b8f06279b5e8117756a82c1e642e312c2a0e30)
#5 std::__1::thread::join() /home/libcxx-builder/.buildkite-agent/builds/google-libcxx-builder-69f521df8409-1/llvm-project/libcxx-ci/libcxx/src/thread.cpp:51:14 (libc++.so.1+0x73358)
#6 std::__1::jthread::join[abi:v170000]() /home/libcxx-builder/.buildkite-agent/builds/google-libcxx-builder-69f521df8409-1/llvm-project/libcxx-ci/build/generic-tsan/include/c++/v1/__thread/jthread.h:91:49 (t.tmp.exe+0xed5f9) (BuildId: d7fd9285e8961410872ba6334aecc0115d0ff192)
#7 std::__1::jthread::~jthread[abi:v170000]() /home/libcxx-builder/.buildkite-agent/builds/google-libcxx-builder-69f521df8409-1/llvm-project/libcxx-ci/build/generic-tsan/include/c++/v1/__thread/jthread.h:60:7 (t.tmp.exe+0xed708) (BuildId: d7fd9285e8961410872ba6334aecc0115d0ff192)
#8 main /home/libcxx-builder/.buildkite-agent/builds/google-libcxx-builder-69f521df8409-1/llvm-project/libcxx-ci/libcxx/test/std/thread/thread.jthread/join.pass.cpp:108:3 (t.tmp.exe+0xe836d) (BuildId: d7fd9285e8961410872ba6334aecc0115d0ff192)
#9 __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 (libc.so.6+0x29d8f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)
#10 __libc_start_main csu/../csu/libc-start.c:392:3 (libc.so.6+0x29e3f) (BuildId: 69389d485a9793dbe873f0ea2c93e02efaa9aa3d)
#11 _start <null> (t.tmp.exe+0x30484) (BuildId: d7fd9285e8961410872ba6334aecc0115d0ff192)
What might be happening is that TSAN instruments pthread_join
and it thinks that the thread
has already been joined when you call it the second time in the destructor. But in reality, the thread wasn't joined the first time around because of the system error.
I had a looked at TSAN's code
TSAN_INTERCEPTOR(int, pthread_join, void *th, void **ret) {
SCOPED_INTERCEPTOR_RAW(pthread_join, th, ret);
Tid tid = ThreadConsumeTid(thr, pc, (uptr)th);
ThreadIgnoreBegin(thr, pc);
int res = BLOCK_REAL(pthread_join)(th, ret);
ThreadIgnoreEnd(thr);
if (res == 0) {
ThreadJoin(thr, pc, tid);
}
return res;
}
ThreadConsumeTid
calls ConsumeThreadUserId
, which
try to find by id, assert id is found
remove what is found
From what I can see, the first time join
is called, the user id is removed. then join
throws.
The second time join
is called, the user id is no longer there and hence assertion failure.