This repository was archived by the owner on Jul 17, 2025. It is now read-only.

Description
Describe the bug
The TLB WorkQueue is currently not bounded in size. It holds both shootdown requests (which are important to handle) and advance replica work requests (which are less important to handle).
Currently, if the queue is full and enqueue is called the error is ignored: https://github.com/vmware-labs/node-replicated-kernel/blob/fc25186d57ca400c8e4a7cb313deb8eabd21d971/kernel/src/arch/x86_64/tlb.rs#L112
If this is uncommented, it becomes clear that some requests may be dropped if the queue is full.
Reproduction steps
- Change the line to check for failure to enqueue (use expect to unwrap the result)
- Run the fxmark benchmark with 96-ish cores
- Most of the time, it will cause an error.
Expected behavior
We would like a scenario where the queue has a theoretical bound, so that we can ensure it is always possible to enqueue. This is an important property because, overall, we just want to make sure shootdowns are not lost.
Additional context
No response