Reallocation of extractors and buffers

Overview

The typical tatami pattern is to create a new Extractor and buffer for each pass through the matrix. For algorithms that need to perform multiple passes through the matrix (e.g., approximate SVD algorithms), this is theoretically suboptimal as it performs re-allocations that could have been re-used across iterations. In practice, I doubt this has much impact as the allocation work is small compared to the iteration through the matrix. Nonetheless, we should probably test it out.

Setup

We consider repeated products of a row-major matrix with a double-precision vector. To recapitulate tatami's behavior, we extract each row into a separate buffer before computing the dot product. This is repeated for each row to obtain the matrix-vector product, which is eventually summed to obtain a simple summary statistic. The entire process is then repeated for the desired number of iterations of the cross-product.

In the "reused" approach, the buffer is allocated once outside of the product iterations, and re-used in each iteration. In the "re-allocated" approach, this buffer is allocated anew within the product iteration loop. The question is whether the reallocation of the buffer contributes to the total runtime of the entire process.

Results

No, it doesn't have a significant effect, thank god. The differences here are minor, sporadic, and sometimes depend on the order in which the methods are run.

$ ./build/multtest 
Testing a 10000 x 2000 matrix

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|      287,389,270.00 |                3.48 |    0.5% |      3.17 | `reused`
|      287,582,146.00 |                3.48 |    0.1% |      3.17 | `reallocated`

$ ./build/multtest -r 2000 -c 10000
Testing a 2000 x 10000 matrix

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|      295,690,949.00 |                3.38 |    1.1% |      3.24 | `reused`
|      293,861,844.00 |                3.40 |    1.0% |      3.24 | `reallocated`

$ ./build/multtest -r 100000 -c 200
Testing a 100000 x 200 matrix

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|      264,561,939.00 |                3.78 |    0.2% |      2.92 | `reused`
|      270,300,380.00 |                3.70 |    0.4% |      2.99 | `reallocated`

$ ./build/multtest -r 200 -c 100000
Testing a 200 x 100000 matrix

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|      401,299,832.00 |                2.49 |    0.3% |      4.42 | `reused`
|      401,185,430.00 |                2.49 |    0.1% |      4.42 | `reallocated`

Frankly, this is a relief, as we don't have to drag existing allocations around the place to squeeze out more performance.

Obviously this is not nice design as the internals of a tatami-based function now spill out into the caller. Everything would need to accept an extra Workspace class to enable memory re-use, which is a little awkward.
It would increase total memory usage because the allocations can't be easily repurposed in between the tatami operations. Any intervening allocation requests would use extra memory on top of the persisted extractors/buffers.
A tatami::OracularExtractor instance can't be re-used anyway once its predictions have been consumed. Any re-use strategy would need to switch to MyopicExtractors, which may incur an even greater performance penalty.

This is particularly tedious for multi-threaded scenarios where each thread needs its own copy of everything to avoid false sharing. Indeed, if the number of threads changes across calls, we end up having to reallocate again.

Build instructions

Just use the usual CMake process:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
multiplication.cpp		multiplication.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reallocation of extractors and buffers

Overview

Setup

Results

Build instructions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reallocation of extractors and buffers

Overview

Setup

Results

Build instructions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages