I saw the impl heavily relies on ThreadLocal like io.opentelemetry.context.ThreadLocalContextStorage.
While in our LnP test against JDK virtual thread, we found threadlocal could be the bottleneck.
Is there any plan to switch to ScopedValue?
Here's also some comments from AI:
The Performance Overhead Sources
- Thread.currentThread() lookup: Each get()/set() needs to get the current thread. With virtual threads, this involves:
- Finding the carrier thread
- Looking up which virtual thread is currently mounted
- This has more indirection than with platform threads
- Mounting/unmounting cost: When a virtual thread blocks (IO), it's unmounted. On resumption, there's overhead. ThreadLocal access
during this transition adds to the cost.
- Potential JVM internal contention: While ThreadLocalMap itself is largely lock-free, some JVM implementations have shown
contention-like bottlenecks at scale due to:
- Internal hash map operations
- Memory barrier costs with millions of concurrent threads
The Reality Check
┌────────────────────────────────┬────────────────────────────────────────┐
│ Aspect │ Reality │
├────────────────────────────────┼────────────────────────────────────────┤
│ Traditional lock contention? │ No │
├────────────────────────────────┼────────────────────────────────────────┤
│ Per-operation overhead higher? │ Yes, marginally │
├────────────────────────────────┼────────────────────────────────────────┤
│ Scales to millions of threads? │ Debatable - depends on access patterns │
├────────────────────────────────┼────────────────────────────────────────┤
│ ScopedValue better? │ Yes, designed for virtual threads │
└────────────────────────────────┴────────────────────────────────────────┘
Recommendation
For new code with virtual threads, prefer ScopedValue (Java 21+) over ThreadLocal. It's semantics-matched to virtual thread model and
avoids some of the historical design choices in ThreadLocal that weren't made with millions of threads in mind.
If stuck with ThreadLocal, keep access patterns minimal and benchmark your specific use case.
I saw the impl heavily relies on ThreadLocal like io.opentelemetry.context.ThreadLocalContextStorage.
While in our LnP test against JDK virtual thread, we found threadlocal could be the bottleneck.
Is there any plan to switch to ScopedValue?
Here's also some comments from AI:
The Performance Overhead Sources
- Finding the carrier thread
- Looking up which virtual thread is currently mounted
- This has more indirection than with platform threads
during this transition adds to the cost.
contention-like bottlenecks at scale due to:
- Internal hash map operations
- Memory barrier costs with millions of concurrent threads
The Reality Check
┌────────────────────────────────┬────────────────────────────────────────┐
│ Aspect │ Reality │
├────────────────────────────────┼────────────────────────────────────────┤
│ Traditional lock contention? │ No │
├────────────────────────────────┼────────────────────────────────────────┤
│ Per-operation overhead higher? │ Yes, marginally │
├────────────────────────────────┼────────────────────────────────────────┤
│ Scales to millions of threads? │ Debatable - depends on access patterns │
├────────────────────────────────┼────────────────────────────────────────┤
│ ScopedValue better? │ Yes, designed for virtual threads │
└────────────────────────────────┴────────────────────────────────────────┘
Recommendation
For new code with virtual threads, prefer ScopedValue (Java 21+) over ThreadLocal. It's semantics-matched to virtual thread model and
avoids some of the historical design choices in ThreadLocal that weren't made with millions of threads in mind.
If stuck with ThreadLocal, keep access patterns minimal and benchmark your specific use case.