Use profiling to improve performance

I ran a profile to try to understand why my Lincheck tests now take 1 hour to run, which became much worse in the recent releases. There was no smoking gun (from my naive understanding of the project), but a few easy changes stood out. As a workaround, the test now defaults to small parameters for a quicker sanity check and a periodic regression test uses the prior parameters (closer to Lincheck's defaults).

```console
./gradlew :caffeine:lincheckTest -PjavaVersion=25 --tests '*CacheLincheckTest*' \
  -Dlincheck.modelChecking.iterations=100 \
  -Dlincheck.modelChecking.invocationsPerIteration=1000 \
  -Dlincheck.stress.iterations=100 \
  -Dlincheck.stress.invocationsPerIteration=10000 \
  --console colored -Pjfr
```
[lincheckTest.jfr.zip](https://github.com/user-attachments/files/24363654/lincheckTest.jfr.zip)

#### Perform a prescreen before a locking computeIfAbsent
<img width="1510" height="378" alt="Image" src="https://github.com/user-attachments/assets/eef2eb05-ef2b-44e3-a70d-82bffefb80a7" />

This shows that there is a small optimization that might be slow due to lock contention. A `computeIfAbsent` will pessimistically lock in most cases ([benchmark](https://github.com/ben-manes/caffeine/wiki/Benchmarks#compute)), so it is better to perform an optimistic `get(key)` first for hot keys.

https://github.com/JetBrains/lincheck/blob/1b6c56a58364b8eaa8e2bafef43379f98bbdceeb/common/src/main/org/jetbrains/lincheck/util/UnsafeHolder.kt#L31-L50


#### Negative caching
This shows a large number of exceptions thrown when looking for fields. It could cache this rather than using exceptions as control flow.

<img width="1840" height="1114" alt="Image" src="https://github.com/user-attachments/assets/e607d0f3-70c5-4427-9c2b-dd4c0a901d96" />

#### Allocations
I believe the above might reduce these allocation hotspots, which adds GC pressure. I found switching from G1 to ParallelGC to reduce OldGen pause times and speed up the run by a ~10 seconds (but total time is ~10min in the profiled task). Some additional caching, e.g. of Integers or using a primitive map, using an initial capacity to avoid resize copying, etc might help a lot.

<img width="1840" height="1114" alt="Image" src="https://github.com/user-attachments/assets/4ba9fe1f-2660-49ba-99ce-3510eb4e1339" />

<img width="1840" height="1114" alt="Image" src="https://github.com/user-attachments/assets/ca88dd2b-c352-46c8-ad51-c6c965dbcb06" />

#### Avoid busy waiting
Much of the CPU time is busy waiting, which on a low core system (like CI) might interfere with the test. If possible it block on a signal. The spin wait uses `Thread.yield()` but might do better with `Thread.onSpinWait()` to avoid forced context switches on many core machines. The main problem could be that CI typically has 2 cores, so you are taking away one core for monitoring and likely the analysis wants 2+ threads interleaving but only gets 1 thread actively running. That and constant context switches might be making it work much harder on CI builds. 

<img width="1840" height="1114" alt="Image" src="https://github.com/user-attachments/assets/c8c12201-c6e0-4967-a02f-51c68c92a37b" />

https://github.com/JetBrains/lincheck/blob/1b6c56a58364b8eaa8e2bafef43379f98bbdceeb/src/jvm/main/org/jetbrains/kotlinx/lincheck/runner/ActiveThreadPoolExecutor.kt#L144-L167

	val fieldOffsetCache = ConcurrentHashMap<Field, Long>()
	val fieldBaseObjectCache = ConcurrentHashMap<Field, Any>()

	@Suppress("DEPRECATION")
	inline fun <T> readFieldViaUnsafe(obj: Any?, field: Field, getter: Unsafe.(Any?, Long) -> T): T {
	if (Modifier.isStatic(field.modifiers)) {
	val base = fieldBaseObjectCache.computeIfAbsent(field) {
	UnsafeHolder.UNSAFE.staticFieldBase(it)
	}
	val offset = fieldOffsetCache.computeIfAbsent(field) {
	UnsafeHolder.UNSAFE.staticFieldOffset(it)
	}
	return UnsafeHolder.UNSAFE.getter(base, offset)
	} else {
	val offset = fieldOffsetCache.computeIfAbsent(field) {
	UnsafeHolder.UNSAFE.objectFieldOffset(it)
	}
	return UnsafeHolder.UNSAFE.getter(obj, offset)
	}
	}

	private fun awaitTask(iThread: Int, deadline: Long): Throwable? {
	val result = getResult(iThread, deadline)
	// Check whether there was an exception during the execution.
	return result as? Throwable
	}

	private fun getResult(iThread: Int, deadline: Long): Any {
	// Active wait for a result during the limited number of loop cycles.
	val result = resultSpinner.spinWaitBoundedFor { results[iThread] }
	if (result != null) return result
	// Park with timeout until the result is set or the timeout is passed.
	val currentThread = Thread.currentThread()
	if (results.compareAndSet(iThread, null, currentThread)) {
	while (results[iThread] === currentThread) {
	val timeLeft = deadline - System.nanoTime()
	if (timeLeft <= 0) {
	isStuck = true
	throw TimeoutException()
	}
	LockSupport.parkNanos(timeLeft)
	}
	}
	return results[iThread]
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use profiling to improve performance #919

Perform a prescreen before a locking computeIfAbsent

Negative caching

Allocations

Avoid busy waiting

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use profiling to improve performance #919

Description

Perform a prescreen before a locking computeIfAbsent

Negative caching

Allocations

Avoid busy waiting

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions