-
Notifications
You must be signed in to change notification settings - Fork 295
Description
I found an issue with the iterator in 4.1.2 that was fixed in 4.4.2, so I was going to upgrade. But, I noticed in our microbenchmarks that 4.4.2 was slower than 4.1.2. I checked with the CRoaring microbenchmarks and can see a similar slowdown, almost across the board.
Is this a known regression that we are accepting for other reasons, or is it not expected? FYI, I'm compiling with GCC 14.2.0 on Intel(R) Xeon Gold 6248R CPU @ 3.00GHz, default flags from the CRoaring CMake environment.
Here is a comparison using the default microbenchmark: I used some Emacs rectangle kill/yank foo to add the comparison (I know there's some fancy tooling to compare Google Bench output but I forget how to do it :) ) You can see that many of the values are worse for 4.4.2: fewer iterations / longer CPU time:
2025-10-08T11:49:06-04:00
Running ./microbenchmarks/bench
Run on (12 X 3000 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x12)
L1 Instruction 32 KiB (x12)
L2 Unified 1024 KiB (x12)
L3 Unified 36608 KiB (x12)
Load Average: 0.24, 0.37, 0.26
AVX-2 hardware: yes
AVX-512: supported by compiler
AVX-512 hardware: no
In RAM volume in MiB (estimated): 1.802830
benchmarking other files: You may pass is a data directory as a parameter.
data source: /data/psh13/src/CRoaring/benchmarks/realdata/census1881
number of bitmaps: 200
performance counters: No privileged access (sudo may help).
x64: detected
4.1.2 4.4.2
----------------------------------------------------------------------------------------
Benchmark Time Iterations Time Iterations
----------------------------------------------------------------------------------------
SuccessiveIntersection 28848 ns 24252 29228 ns 23898
SuccessiveIntersection64 46578 ns 15024 49466 ns 14136
SuccessiveIntersectionCardinality 26155 ns 26779 27928 ns 25011
SuccessiveIntersectionCardinality64 42673 ns 16406 43998 ns 15883
SuccessiveUnionCardinality 36853 ns 18958 36438 ns 19159
SuccessiveUnionCardinality64 97501 ns 7187 100082 ns 6984
SuccessiveDifferenceCardinality 32514 ns 21540 33503 ns 20863
SuccessiveDifferenceCardinality64 69695 ns 10041 72552 ns 9645
SuccessiveUnion 573901 ns 1227 558773 ns 1255
SuccessiveUnion64 935391 ns 745 931132 ns 751
TotalUnion 637562 ns 1097 633914 ns 1103
TotalUnionHeap 2014668 ns 347 1952174 ns 358
RandomAccess 3531 ns 196919 3514 ns 199751
RandomAccess64 5811 ns 121117 6394 ns 109562
RandomAccess64Cpp 4500 ns 132872 4554 ns 153998
ToArray 148464 ns 4702 148486 ns 4697
ToArray64 604936 ns 1156 615433 ns 1134
IterateAll 3425703 ns 204 3431836 ns 204
IterateAll64 5007200 ns 142 5078060 ns 139
ComputeCardinality 3928 ns 179276 3638 ns 191939
ComputeCardinality64 22224 ns 31109 24014 ns 29137
RankManySlow 16732 ns 41747 15564 ns 44669
RankMany 6382 ns 109172 6171 ns 111491
A few things (Union tests for example) are faster but a number are slower, particularly 64bit (which is, unfortunately, mostly what I use).
If this is interesting / new info I can do some bisecting across versions to get more details. If this is known / understood then I won't bother.