README amended. [no ci]

Maxim Egorushkin · Maxim Egorushkin · commit a59eb288dc3e · 2025-12-19T09:04:45.000Z
diff --git a/README.md b/README.md
@@ -40,7 +40,7 @@ The impact of each of these small design choices on their own is barely measurab
 These design choices are also limitations:
 
 * The maximum queue size must be set at compile time or construction time. The circular buffer side-steps the memory reclamation problem inherent in linked-list based queues for the price of fixed buffer size. See [Effective memory reclamation for lock-free data structures in C++][4] for more details. Fixed buffer size may not be that much of a limitation, since once the queue gets larger than the maximum expected size that indicates a problem that elements aren't consumed fast enough, and if the queue keeps growing it may eventually consume all available memory which may affect the entire system, rather than the problematic process only. The only apparent inconvenience is that one has to do an upfront calculation on what would be the largest expected/acceptable number of unconsumed elements in the queue.
-* There are no OS-blocking push/pop functions. This queue is designed for ultra-low-latency scenarios and using an OS blocking primitive would be sacrificing push-to-pop latency. For lowest possible latency one cannot afford blocking in the OS kernel because the wake-up latency of a blocked thread is about 1-3 microseconds, whereas this queue's round-trip time can be as low as 150 nanoseconds.
+* There are no OS-blocking push/pop functions. This queue is designed for ultra-low-latency scenarios and using an OS blocking primitive would be sacrificing push-to-pop latency. For lowest possible latency one cannot afford calling the OS kernel or blocking in the OS kernel because the wake-up latency of a blocked thread is about 1-3 microseconds, whereas this queue's round-trip time (push a message to another thread and pop its reply) can be below 100 nanoseconds. CPU vulnerability mitigations made system calls dramatically more expensive, crippling performance even worse. In general, handing off spin-waiting to an OS blocking primitive is a problem with no satisfactory low-latency solutions.
 
 Ultra-low-latency applications need just that and nothing more. The minimalism pays off, see the [throughput and latency benchmarks][1].