Skip to content

Use profiling to improve performance #919

@ben-manes

Description

@ben-manes

I ran a profile to try to understand why my Lincheck tests now take 1 hour to run, which became much worse in the recent releases. There was no smoking gun (from my naive understanding of the project), but a few easy changes stood out. As a workaround, the test now defaults to small parameters for a quicker sanity check and a periodic regression test uses the prior parameters (closer to Lincheck's defaults).

./gradlew :caffeine:lincheckTest -PjavaVersion=25 --tests '*CacheLincheckTest*' \
  -Dlincheck.modelChecking.iterations=100 \
  -Dlincheck.modelChecking.invocationsPerIteration=1000 \
  -Dlincheck.stress.iterations=100 \
  -Dlincheck.stress.invocationsPerIteration=10000 \
  --console colored -Pjfr

lincheckTest.jfr.zip

Perform a prescreen before a locking computeIfAbsent

Image

This shows that there is a small optimization that might be slow due to lock contention. A computeIfAbsent will pessimistically lock in most cases (benchmark), so it is better to perform an optimistic get(key) first for hot keys.

val fieldOffsetCache = ConcurrentHashMap<Field, Long>()
val fieldBaseObjectCache = ConcurrentHashMap<Field, Any>()
@Suppress("DEPRECATION")
inline fun <T> readFieldViaUnsafe(obj: Any?, field: Field, getter: Unsafe.(Any?, Long) -> T): T {
if (Modifier.isStatic(field.modifiers)) {
val base = fieldBaseObjectCache.computeIfAbsent(field) {
UnsafeHolder.UNSAFE.staticFieldBase(it)
}
val offset = fieldOffsetCache.computeIfAbsent(field) {
UnsafeHolder.UNSAFE.staticFieldOffset(it)
}
return UnsafeHolder.UNSAFE.getter(base, offset)
} else {
val offset = fieldOffsetCache.computeIfAbsent(field) {
UnsafeHolder.UNSAFE.objectFieldOffset(it)
}
return UnsafeHolder.UNSAFE.getter(obj, offset)
}
}

Negative caching

This shows a large number of exceptions thrown when looking for fields. It could cache this rather than using exceptions as control flow.

Image

Allocations

I believe the above might reduce these allocation hotspots, which adds GC pressure. I found switching from G1 to ParallelGC to reduce OldGen pause times and speed up the run by a ~10 seconds (but total time is ~10min in the profiled task). Some additional caching, e.g. of Integers or using a primitive map, using an initial capacity to avoid resize copying, etc might help a lot.

Image Image

Avoid busy waiting

Much of the CPU time is busy waiting, which on a low core system (like CI) might interfere with the test. If possible it block on a signal. The spin wait uses Thread.yield() but might do better with Thread.onSpinWait() to avoid forced context switches on many core machines. The main problem could be that CI typically has 2 cores, so you are taking away one core for monitoring and likely the analysis wants 2+ threads interleaving but only gets 1 thread actively running. That and constant context switches might be making it work much harder on CI builds.

Image

private fun awaitTask(iThread: Int, deadline: Long): Throwable? {
val result = getResult(iThread, deadline)
// Check whether there was an exception during the execution.
return result as? Throwable
}
private fun getResult(iThread: Int, deadline: Long): Any {
// Active wait for a result during the limited number of loop cycles.
val result = resultSpinner.spinWaitBoundedFor { results[iThread] }
if (result != null) return result
// Park with timeout until the result is set or the timeout is passed.
val currentThread = Thread.currentThread()
if (results.compareAndSet(iThread, null, currentThread)) {
while (results[iThread] === currentThread) {
val timeLeft = deadline - System.nanoTime()
if (timeLeft <= 0) {
isStuck = true
throw TimeoutException()
}
LockSupport.parkNanos(timeLeft)
}
}
return results[iThread]
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions