Description
This umbrella issue is about taking a closer look at how many allocations our driver performs. While investigating with https://github.com/psarna/trace_alloc, one obvious conclusion was that #408 is going to help, because current load balancing is responsible for ~40% of all allocations in certain workloads.
However, we should investigate more and perhaps recommend using a different allocator for our driver, or even go one step further and start using https://crates.io/crates/jemallocator or a similar crate in order to pull in an allocator that better suits our needs.
As a very simple test, I ran our benchmarks (https://github.com/psarna/rust-driver-benchmarks, fork of https://github.com/cvybhu/rust-driver-benchmarks) example with and without jemalloc (provided externally by the LD_PRELOAD trick, LD_PRELOAD=/usr/lib64/libjemalloc.so.2 ./target/release/basic -d -n 10.0.1.12 -t 10000000
), and the results are as follows:
- with the default allocator, it takes > 30 seconds to finish 10M inserts
- with jemalloc, it takes ~23 seconds
While a fair part of the allocations is our load balancing, which may significantly change after #408, after it's resolved we should get back to this topic and figure out if our allocation patterns aren't a very good fit for jemalloc, which is adapted to serving lots of small allocations and to items with fixed size.
Bonus: running a binary which uses jemalloc with MALLOC_CONF=stats_print:true
env variable prints lots of valuable stats. It can also be combined with LD_PRELOAD, as in MALLOC_CONF=stats_print:true LD_PRELOAD=/usr/lib64/libjemalloc.so.2 ./target/release/basic -d -n 10.0.1.12 -t 10000000