Skip to content

Conversation

@grcevski
Copy link
Owner

@grcevski grcevski commented Dec 17, 2019

The following change removes the MetricName object allocation that would happen on any metering or timing call to the AbstractMetrics. The object allocation rate can get up to multiple GBs/sec, if Pinot can push a QPS of 2000 queries/sec or more. There's an additional locking cost of creating the MetricNames objects because the underlying yammer Metrics object that's created uses the JDK getClassName method call, which does a synchronized lock on the class loader object, effectively creating a false contention problem.

The metrics allocation and contention issue is resolved by using a capped cache of 1000 MetricNames using the high performance Caffeine cache class. It's important to use capped collection here to avoid memory leaks in case there's a bug in the Metric naming.

Tested various LRU cache implementations with this quick benchmarks we made here:

Results are as follows (4 core/ 8 thread Linux JDK8):

Running with 8 worker threads (lower is better)...
Timing Guava cache ...
Elapsed time 42205ms
Timing Map cache ...
Elapsed time 22649ms
Timing Caffeine cache ...
Elapsed time 11888ms

Running with 16 worker threads (lower is better)...
Timing Guava cache ...
Elapsed time 90137ms
Timing Map cache ...
Elapsed time 34901ms
Timing Caffeine cache ...
Elapsed time 27998ms

Caffeine as implementation is equal or better than plain old ConcurrentHashMap, seems most suitable for using it as LRU cache.

Todo:

  • Find good initial concurrency parameters
  • Benchmark the individual improvement and produce numbers
  • Add tests for MetricName caching

Performance measurement results:
Machine configuration:
4 core (8 threads) Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz
32GB of RAM
Linux x86-64, kernel: 5.0.0-37-generic

Benchmark configuration:
TPC-H (optimal index)
20 clients
180s runtime

Results (QPS higher is better, response time lower is better)

Base with (fix-benchmark-client):
Time Passed: 180.014s, Query Executed: 513267, QPS: 2851.2615685446685, Avg Response Time: 7.001299518574154ms
Improved (this branch):
Time Passed: 180.014s, Query Executed: 528540, QPS: 2936.104969613474, Avg Response Time: 6.800626253452908ms

Co-authored-by: @grcevski @charliegracie @macarte @adityamandaleeka

@grcevski grcevski force-pushed the fix-benchmark-client branch from bc68f08 to 2eb1e6b Compare January 28, 2020 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants