Optimize thread_queue_hack with std::mutex#83
Conversation
Replaced the manual spinlock implementation (using `__sync_bool_compare_and_swap` and `usleep(0)`) in `thread_queue_hack` with `std::mutex`. This change: - Replaces busy-waiting with efficient OS-level blocking. - Fixes a potential lock leak in `push_back` when the queue overflows (previously threw without unlocking). - Avoids using `memset` on non-trivial types (which was unsafe if member types were changed). - Standardizes synchronization using C++17 primitives. Benchmark results (standalone test with 10k iterations): - 4 threads: ~5% throughput improvement (293k -> 309k ops/s). - 8 threads: Comparable throughput, but with better CPU scheduling behavior.
There was a problem hiding this comment.
Pull request overview
This PR modernizes the thread_queue_hack synchronization mechanism by replacing a custom spinlock implementation with std::mutex.
Changes:
- Replaced manual spinlock (
__sync_bool_compare_and_swapwith busy-waiting) withstd::mutexfor more efficient OS-level blocking - Fixed lock leak in
push_backwhen queue overflows by unlocking before throwing exception - Replaced unsafe
memsetcalls withstd::fillto properly handle the now-non-trivial type containingstd::mutex
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| std::mutex m_mutex; | ||
| value_type m_queue[max_size + 1]; |
There was a problem hiding this comment.
The placement of std::mutex as the first member may affect cacheline alignment strategy. The original code used a simple int m_lock which is 4 bytes, while std::mutex is typically much larger (often 40+ bytes on Linux). This could cause the m_queue array to span additional cachelines, potentially impacting the performance benefits of the lt_cacheline_aligned attribute on this class. Consider placing m_mutex after m_queue to keep the frequently-accessed queue data at the beginning of the cacheline, or verify that this new layout doesn't negatively impact false sharing patterns.
| std::mutex m_mutex; | |
| value_type m_queue[max_size + 1]; | |
| value_type m_queue[max_size + 1]; | |
| std::mutex m_mutex; |
| #include <algorithm> | ||
| #include <cassert> | ||
| #include <cstdlib> | ||
| #include <cstring> |
There was a problem hiding this comment.
The <cstring> header is no longer needed after removing the memset calls. Consider removing this unused include.
| #include <cstring> |
| #include <cstring> | ||
| #include <iostream> | ||
| #include <iterator> | ||
| #include <mutex> |
There was a problem hiding this comment.
The PR description mentions "C++17 primitives", but std::mutex has been available since C++11. This is a minor inaccuracy in the description, though the change is still correct and beneficial.
Replaced the manual spinlock implementation (using
__sync_bool_compare_and_swapandusleep(0)) inthread_queue_hackwithstd::mutex.This change:
push_backwhen the queue overflows (previously threw without unlocking).memseton non-trivial types (which was unsafe if member types were changed).#gsoc