Fix wait_on_address_64 on Linux when only upper 32 bits change (use futex2)#590
Open
parth21999 wants to merge 2 commits into
Open
Fix wait_on_address_64 on Linux when only upper 32 bits change (use futex2)#590parth21999 wants to merge 2 commits into
parth21999 wants to merge 2 commits into
Conversation
The Linux implementation of wait_on_address_64 used the legacy SYS_futex
syscall, which fundamentally compares only 32 bits of the value at the
address. When *address differed from compare_value only in the upper
32 bits (e.g. lower 32 bits both zero), the kernel's atomic
check-before-sleep saw "equal" and slept until timeout, violating the
wait_on_address_64 contract (SRS_SYNC_43_002).
In multi-threaded scenarios this also caused lost wakeups in
InterlockedHL_WaitForNotValue64 and the other 64-bit InterlockedHL
waiters: if a writer changed the upper 32 bits of the value between the
function's interlocked_add_64 read and its wait_on_address_64 call, the
kernel's 32-bit check missed the change and the waiter blocked until
timeout. The zrpc multiplexer integration tests hit this race because
SUBSTREAM_IDs are 64-bit values whose lower 32 bits are an index that
starts at 0.
Fix by switching the 64-bit wait/wake helpers to the Linux 6.7+ futex2
syscalls (SYS_futex_wait, SYS_futex_wake) with FUTEX2_SIZE_U64 |
FUTEX2_PRIVATE, which performs a true 64-bit atomic check-before-sleep.
The 32-bit helpers are unchanged. The wait deadline is now an absolute
CLOCK_MONOTONIC time as required by the futex2 ABI; a NULL deadline is
passed when timeout_ms is UINT32_MAX so the waiter sleeps indefinitely.
Add two regression tests:
* sync_int : wait_on_address_64_returns_immediately_when_only_upper_32_bits_differ
- Single-threaded, deterministic. Sets *var = 0x100000000 then calls
wait_on_address_64(&var, 0, 5000) and asserts the call returns
WAIT_ON_ADDRESS_OK promptly.
* interlocked_hl_int : interlocked_hl_wait_for_not_value_64_with_only_upper_32_bits_change
- Simulates the multiplexer race deterministically by replaying the
inner loop of InterlockedHL_WaitForNotValue64: read with
interlocked_add_64, change the upper 32 bits, then call
wait_on_address_64 with the original value as compare_value.
Both tests reproduce the bug on the legacy implementation (timing out
after 5 s) and pass on the new implementation.
Note: requires a Linux kernel with futex2 support (>= 6.7). On older
kernels the syscall returns ENOSYS and wait_on_address_64 returns
WAIT_ON_ADDRESS_ERROR, surfacing the requirement explicitly rather than
silently sleeping incorrectly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
parth21999
commented
May 7, 2026
| #endif | ||
| #ifndef FUTEX2_PRIVATE | ||
| #define FUTEX2_PRIVATE FUTEX_PRIVATE_FLAG /* 128 */ | ||
| #endif |
Member
Author
There was a problem hiding this comment.
Why are these needed? where are they coming from?
parth21999
commented
May 7, 2026
|
|
||
| /* Codes_SRS_SYNC_LINUX_05_002: [ wait_on_address_64 shall call syscall to wait on value at address to change to a value different than the one provided in compare_value. ] */ | ||
| int syscall_result = syscall(SYS_futex, address, FUTEX_WAIT_PRIVATE, compare_value, &timeout, NULL, 0); | ||
| /* Codes_SRS_SYNC_LINUX_05_002: [ wait_on_address_64 shall call syscall(SYS_futex_wait) with FUTEX2_SIZE_U64 | FUTEX2_PRIVATE and a CLOCK_MONOTONIC absolute deadline, performing a true 64-bit atomic check-before-sleep. ] */ |
Member
Author
parth21999
commented
May 7, 2026
| // Concrete example from the multiplexer integration tests: a SUBSTREAM_ID is a | ||
| // 64-bit value where the lower 32 bits are an index. The first substream has | ||
| // index = 0, so the lower 32 bits of its SUBSTREAM_ID match the initial value | ||
| // (0) used as compare_value, even though the upper 32 bits are non-zero. |
Member
Author
There was a problem hiding this comment.
make this comment smaller.
parth21999
commented
May 7, 2026
|
|
||
| //act | ||
| double start_time = timer_global_get_elapsed_ms(); | ||
| WAIT_ON_ADDRESS_RESULT return_val = wait_on_address_64(&var, compare_value, timeout); |
Member
Author
There was a problem hiding this comment.
remove this timer. The result is enough
* sync_int.c: shrink the regression-test comment block; drop the
redundant timer + tolerance assertion - the WAIT_ON_ADDRESS_OK check
alone catches the bug (broken impl returns TIMEOUT).
* interlocked_hl_int.c: same simplification (drop timer block) per the
"everywhere" intent of the timer comment, and drop the now-unused
c_pal/timer.h include.
* sync_linux.c: drop the SYS_futex_wake / SYS_futex_wait /
FUTEX2_SIZE_U64 / FUTEX2_PRIVATE fallback #defines and rely on the
headers (sys/syscall.h from glibc 2.39+, linux/futex.h from
linux-libc-dev kernel 6.7+); the build now requires those headers and
fails compile cleanly otherwise. Add a comment documenting the
requirement.
* Retag the four SRS items whose text changed with devid 43:
SRS_SYNC_LINUX_05_001 -> SRS_SYNC_LINUX_43_007
SRS_SYNC_LINUX_05_002 -> SRS_SYNC_LINUX_43_008
SRS_SYNC_LINUX_05_007 -> SRS_SYNC_LINUX_43_009
SRS_SYNC_LINUX_05_008 -> SRS_SYNC_LINUX_43_010
Updates devdoc (sync_linux_requirements.md), source (sync_linux.c
Codes_), and unit tests (sync_linux_ut.c Tests_); the four unchanged
items (_05_003 through _05_006) keep their tags.
* sync_linux_ut.c / mock_sync.{c,h}: introduce mock_futex_wait and
mock_futex_wake mocks for the futex2 syscalls (the legacy SYS_futex
mock_syscall is kept for the 32-bit functions). A variadic
mock_syscall_dispatcher routes calls based on syscall number. The 4
wait_on_address_64 tests and 2 wake_by_address_*_64 tests are updated
to expect the new mocks with FUTEX2_SIZE_U64 | FUTEX2_PRIVATE flags
and CLOCK_MONOTONIC.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Linux implementation of
wait_on_address_64used the legacySYS_futexsyscall, which fundamentally compares only 32 bits of the value at the address. When*addressdiffered fromcompare_valueonly in the upper 32 bits (e.g. both lower 32 bits zero), the kernel's atomic check-before-sleep saw "equal" and slept until timeout, violating thewait_on_address_64contract (SRS_SYNC_43_002).In multi-threaded scenarios this also caused lost wakeups in
InterlockedHL_WaitForNotValue64(and the other 64-bitInterlockedHLwaiters): if a writer changed the upper 32 bits of the value between the function'sinterlocked_add_64read and itswait_on_address_64call, the kernel's 32-bit check missed the change and the waiter blocked until timeout.The zrpc multiplexer integration tests hit this race because
SUBSTREAM_IDis a 64-bit value whose lower 32 bits are an index starting at 0 — see zrpc PR 15645356, which worked around the bug at the call site.Fix
Switch the 64-bit wait/wake helpers in
c_pal_ll/linux/src/sync_linux.cto the Linux 6.7+ futex2 syscalls (SYS_futex_wait,SYS_futex_wake) withFUTEX2_SIZE_U64 | FUTEX2_PRIVATE, which performs a true 64-bit atomic check-before-sleep. The 32-bit helpers are unchanged.Notable details:
CLOCK_MONOTONICtime (per the futex2 ABI), not relative.NULLdeadline is passed whentimeout_ms == UINT32_MAXso the waiter sleeps indefinitely.FUTEX2_SIZE_U64flag so they hash to the same kernel futex queue.#definefallbacks forSYS_futex_wake,SYS_futex_wait,FUTEX2_SIZE_U64,FUTEX2_PRIVATEare added so the code compiles on glibc versions that don't yet expose these constants.Regression tests
Two deterministic tests were added that reproduce the bug on the legacy implementation (time out after 5 s) and pass on the new implementation:
wait_on_address_64_returns_immediately_when_only_upper_32_bits_differc_pal_ll/interfaces/tests/sync_int/sync_int.c*var = 0x100000000then assertswait_on_address_64(&var, 0, 5000)returnsWAIT_ON_ADDRESS_OKpromptly.interlocked_hl_wait_for_not_value_64_with_only_upper_32_bits_changecommon/tests/interlocked_hl_int/interlocked_hl_int.cInterlockedHL_WaitForNotValue64deterministically: reads withinterlocked_add_64, changes the upper 32 bits, then callswait_on_address_64with the original value ascompare_value.Verification
straceconfirms correct syscalls and arguments:syscall_0x1c7(addr, val=0, mask=~0, flags=0x83, &deadline, clockid=1)→SYS_futex_waitwithFUTEX2_SIZE_U64 | FUTEX2_PRIVATE,CLOCK_MONOTONICsyscall_0x1c6(addr, mask=~0, nr=1 or INT_MAX, flags=0x83)→SYS_futex_wakewithFUTEX2_SIZE_U64 | FUTEX2_PRIVATEENOSYSandwait_on_address_64now returnsWAIT_ON_ADDRESS_ERROR, surfacing the kernel-version requirement explicitly instead of silently sleeping incorrectly. CI/Linux runners on kernel ≥ 6.7 will run the tests for real.Behavior change
wait_on_address_64with changed upper 32 bits, lower 32 bits unchangedOKimmediately (kernel ≥ 6.7) / returnsERRORwithENOSYS(kernel < 6.7)wait_on_address_64with full-64-bit differenceCo-authored-by: Copilot 223556219+Copilot@users.noreply.github.com