Skip to content

Conversation

@jasonbu
Copy link
Contributor

@jasonbu jasonbu commented Nov 30, 2025

Summary

If atomic_try_cmpxchg_xxxx runs on LL/SC architectures (e.g.ARMv7, ARMv8, RISC-V), the weak CAS expands to a single LDREX/STREX pair.

If the CPU takes an IRQ/FIQ/SVC between the two instructions, hardware performs an implicit CLREX and the following STREX returns 1, therefore atomic_try_cmpxchg_xxxx return failure even though addr still holds the expected value.

So let's retry atomic_try_cmpxchg_xxxx in this case.

Impact

Before patch, will abort by ostest especially if we disabled the tickless and a loop for try wait.
After patch, will no longer abort by try-wait case.

Testing

CI-test, arm-v8m board test.


在ostest循环压测的时候如果没有开启tickless, 会偶现第一次 mutex init之后就wait失败,经过分析rootcause是部分arch的CAS指令会被中断打断。

修改之后问题不再复现

问题在 #267 同样被提及。

该patch用于修复该问题。后续patch会继续针对信号量/互斥量进行继续优化。

If atomic_try_cmpxchg_xxxx runs on LL/SC architectures (e.g.ARMv7,
ARMv8, RISC-V), the weak CAS expands to a single LDREX/STREX pair.

If the CPU takes an IRQ/FIQ/SVC between the two instructions,
hardware performs an implicit CLREX and the following STREX returns
1, therefore atomic_try_cmpxchg_xxxx return failure even though
*addr* still holds the expected value.

So let's retry atomic_try_cmpxchg_xxxx in this case.

Signed-off-by: buxiasen <[email protected]>
@qingquanzhang151
Copy link

Hi jasonbu,
感谢您为这个功能提交的 PR,代码逻辑清晰且很有实用价值!🙏
在阅读代码时,我注意到当前实现中手动处理了比较 - 交换的逻辑。其实 C11 标准提供的 atomic_compare_exchange_strong_explicit 函数本身就是为原子性比较 - 交换场景设计的,它不仅能保证操作的原子性和内存序正确性,还能简化手动实现的复杂度,同时提升代码的可移植性(适配不同编译器和架构对原子操作的优化)。
想和您探讨一下:是否考虑使用 atomic_compare_exchange_strong_explicit 替换当前的手动实现?这样既能达到相同的功能目标,也能让代码更简洁、更符合原子操作的标准用法。
如果有我没考虑到的场景(比如特定平台兼容性、性能优化需求等),也非常期待您的分享和指导!
再次感谢您的贡献!😊

Hi jasonbu,
Thank you for submitting this PR for the feature! The code logic is clear and highly practical. 🙏
While reviewing the code, I noticed that the compare-and-swap logic is handled manually in the current implementation. Actually, the atomic_compare_exchange_strong_explicit function provided by the C11 standard is specifically designed for atomic compare-and-swap scenarios. It not only guarantees the atomicity of the operation and the correctness of memory ordering but also simplifies the complexity of manual implementation while improving code portability (adapting to atomic operation optimizations across different compilers and architectures).
I'd like to discuss with you: have you considered replacing the current manual implementation with atomic_compare_exchange_strong_explicit? This approach can achieve the same functional goals while making the code more concise and aligned with standard atomic operation practices.
If there are scenarios I haven't considered (such as specific platform compatibility requirements, performance optimization needs, etc.), I would greatly appreciate your insights and guidance!
Thank you again for your contribution! 😊

@jasonbu
Copy link
Contributor Author

jasonbu commented Dec 1, 2025

Hi jasonbu, 感谢您为这个功能提交的 PR,代码逻辑清晰且很有实用价值!🙏 在阅读代码时,我注意到当前实现中手动处理了比较 - 交换的逻辑。其实 C11 标准提供的 atomic_compare_exchange_strong_explicit 函数本身就是为原子性比较 - 交换场景设计的,它不仅能保证操作的原子性和内存序正确性,还能简化手动实现的复杂度,同时提升代码的可移植性(适配不同编译器和架构对原子操作的优化)。 想和您探讨一下:是否考虑使用 atomic_compare_exchange_strong_explicit 替换当前的手动实现?这样既能达到相同的功能目标,也能让代码更简洁、更符合原子操作的标准用法。 如果有我没考虑到的场景(比如特定平台兼容性、性能优化需求等),也非常期待您的分享和指导! 再次感谢您的贡献!😊

Hi jasonbu, Thank you for submitting this PR for the feature! The code logic is clear and highly practical. 🙏 While reviewing the code, I noticed that the compare-and-swap logic is handled manually in the current implementation. Actually, the atomic_compare_exchange_strong_explicit function provided by the C11 standard is specifically designed for atomic compare-and-swap scenarios. It not only guarantees the atomicity of the operation and the correctness of memory ordering but also simplifies the complexity of manual implementation while improving code portability (adapting to atomic operation optimizations across different compilers and architectures). I'd like to discuss with you: have you considered replacing the current manual implementation with atomic_compare_exchange_strong_explicit? This approach can achieve the same functional goals while making the code more concise and aligned with standard atomic operation practices. If there are scenarios I haven't considered (such as specific platform compatibility requirements, performance optimization needs, etc.), I would greatly appreciate your insights and guidance! Thank you again for your contribution! 😊

hi,
该接口属于OS中的高频性能敏感路径,在大部分情况下使用weak都可以覆盖,且性能更好。
基于解决问题考虑,当前patch改动最少。
关于atomic兼容性和进一步的性能优化,有后续的patch跟进。

Try_wait API should be performance sensitive, exchange_weak should able to cover most scene.
And aim to solve th CAS problem, current patch should be least change.
For atomic compatible and furthur performance optimize, we have next topic and patch to keep going.

@qingquanzhang151
Copy link

qingquanzhang151 commented Dec 6, 2025

Hi jasonbu,

感谢您在 openvela 仓库中对 mutex_trylock 问题的修复工作,您的贡献对项目稳定性至关重要!

在仔细查看您提交中使用 atomic_compare_exchange_weak_explicit + 循环 实现 mutex_trylock 的逻辑后,我有一些专业层面的建议想与您探讨,以期更好地贴合 mutex_trylock 的语义设计:

核心背景:mutex_trylock 的核心语义

mutex_trylock 的本质是“单次尝试加锁,若存在资源竞争(锁已被持有)则立即返回失败,不会阻塞或循环等待”。

当前实现的潜在问题

您当前使用 atomic_compare_exchange_weak_explicit 并搭配手动循环的方式,这与 mutex_trylock 的设计初衷存在偏差:

  1. 循环导致非预期阻塞:当 CAS 因“伪失败”(值匹配但返回 false,弱内存模型架构如 ARM 可能出现)发生时,old 值不会被修改,循环会持续重试,导致线程自旋等待而非立即返回,违背了 trylock “竞争即失败”的语义;
  2. weak 版本不适用于单次尝试场景atomic_compare_exchange_weak 允许伪失败,若用于无循环的单次尝试可能误判竞争,但搭配循环后,其行为更接近自旋锁,而非 trylock。

优化建议

建议替换为 atomic_compare_exchange_strong_explicit 并移除手动循环,原因如下:

  1. strong 版本匹配 trylock 语义atomic_compare_exchange_strong_explicit 保证“返回 false 当且仅当原子值与预期值不匹配(真实竞争)”,无伪失败,单次调用即可准确判断是否存在竞争,符合 trylock “一次尝试、成败立判”的需求;
  2. 避免不必要的循环开销:移除手动循环后,若存在竞争会立即返回失败,完全贴合 trylock “不阻塞”的设计目标;
  3. 编译器层面的优化保障strong 版本在编译器实现中已处理伪失败场景(如弱架构下的隐式单次重试),无需手动循环,且能保证操作的原子性和语义准确性。

以上仅为技术层面的探讨建议,若有考虑不周的地方,还请您指正,非常感谢您为项目付出的努力!

Hi jasonbu,

Thank you for your fix for the mutex_trylock issue in the openvela repository—your contribution is crucial to the project's stability!
After carefully reviewing the implementation in your commit, where you used atomic_compare_exchange_weak_explicit with a manual loop for mutex_trylock, I'd like to share some professional suggestions to better align with the core semantics of mutex_trylock:

Core Background: The Essential Semantics of mutex_trylock

The fundamental purpose of mutex_trylock is to "attempt to acquire the lock once; if resource contention exists (the lock is already held), return failure immediately without blocking or spinning." This is the key distinction between trylock and a spinlock.

Potential Issues with the Current Implementation

The current approach of using atomic_compare_exchange_weak_explicit paired with a manual loop may deviate from the intended design of mutex_trylock:
Unintended Blocking Due to Looping: When the CAS operation encounters a "spurious failure" (returns false even though the value matches, which can occur on weak memory model architectures like ARM), the old value remains unchanged. This causes the loop to retry indefinitely, leading to thread spinning instead of immediate failure—violating the "fail fast on contention" semantics of trylock.
weak Variant Is Unsuitable for Single-Shot Attempts: atomic_compare_exchange_weak allows spurious failures, which can lead to false contention detection in single-shot scenarios. When paired with a loop, its behavior becomes closer to a spinlock rather than a trylock.

Optimization Recommendation

I suggest replacing the current implementation with atomic_compare_exchange_strong_explicit and removing the manual loop, for the following reasons:
strong Variant Aligns with trylock Semantics: atomic_compare_exchange_strong_explicit guarantees that "it returns false if and only if the atomic value does not match the expected value (genuine contention)"—no spurious failures. A single call accurately determines contention, perfectly matching the "one attempt, immediate result" requirement of trylock.
Eliminates Unnecessary Loop Overhead: Removing the manual loop ensures immediate failure on contention, fully adhering to the "non-blocking" design goal of trylock.
Compiler-Optimized Reliability: The strong variant handles spurious failures at the compiler level (e.g., implicit single retries on weak architectures), eliminating the need for manual looping while ensuring atomicity and semantic correctness.

Please feel free to correct me if I've overlooked any design considerations. Thank you again for your valuable work on the project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants