Skip to content

Perf regression from 4.1.2 to 4.4.2? #754

@madscientist

Description

@madscientist

I found an issue with the iterator in 4.1.2 that was fixed in 4.4.2, so I was going to upgrade. But, I noticed in our microbenchmarks that 4.4.2 was slower than 4.1.2. I checked with the CRoaring microbenchmarks and can see a similar slowdown, almost across the board.

Is this a known regression that we are accepting for other reasons, or is it not expected? FYI, I'm compiling with GCC 14.2.0 on Intel(R) Xeon Gold 6248R CPU @ 3.00GHz, default flags from the CRoaring CMake environment.

Here is a comparison using the default microbenchmark: I used some Emacs rectangle kill/yank foo to add the comparison (I know there's some fancy tooling to compare Google Bench output but I forget how to do it :) ) You can see that many of the values are worse for 4.4.2: fewer iterations / longer CPU time:

2025-10-08T11:49:06-04:00
Running ./microbenchmarks/bench
Run on (12 X 3000 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 1024 KiB (x12)
  L3 Unified 36608 KiB (x12)
Load Average: 0.24, 0.37, 0.26
AVX-2 hardware: yes
AVX-512: supported by compiler
AVX-512 hardware: no
In RAM volume in MiB (estimated): 1.802830
benchmarking other files: You may pass is a data directory as a parameter.
data source: /data/psh13/src/CRoaring/benchmarks/realdata/census1881
number of bitmaps: 200
performance counters: No privileged access (sudo may help).
x64: detected
                                                 4.1.2                    4.4.2
----------------------------------------------------------------------------------------
Benchmark                                    Time   Iterations         Time   Iterations
----------------------------------------------------------------------------------------
SuccessiveIntersection                   28848 ns        24252     29228 ns        23898
SuccessiveIntersection64                 46578 ns        15024     49466 ns        14136
SuccessiveIntersectionCardinality        26155 ns        26779     27928 ns        25011
SuccessiveIntersectionCardinality64      42673 ns        16406     43998 ns        15883
SuccessiveUnionCardinality               36853 ns        18958     36438 ns        19159
SuccessiveUnionCardinality64             97501 ns         7187    100082 ns         6984
SuccessiveDifferenceCardinality          32514 ns        21540     33503 ns        20863
SuccessiveDifferenceCardinality64        69695 ns        10041     72552 ns         9645
SuccessiveUnion                         573901 ns         1227    558773 ns         1255
SuccessiveUnion64                       935391 ns          745    931132 ns          751
TotalUnion                              637562 ns         1097    633914 ns         1103
TotalUnionHeap                         2014668 ns          347   1952174 ns          358
RandomAccess                              3531 ns       196919      3514 ns       199751
RandomAccess64                            5811 ns       121117      6394 ns       109562
RandomAccess64Cpp                         4500 ns       132872      4554 ns       153998
ToArray                                 148464 ns         4702    148486 ns         4697
ToArray64                               604936 ns         1156    615433 ns         1134
IterateAll                             3425703 ns          204   3431836 ns          204
IterateAll64                           5007200 ns          142   5078060 ns          139
ComputeCardinality                        3928 ns       179276      3638 ns       191939
ComputeCardinality64                     22224 ns        31109     24014 ns        29137
RankManySlow                             16732 ns        41747     15564 ns        44669
RankMany                                  6382 ns       109172      6171 ns       111491

A few things (Union tests for example) are faster but a number are slower, particularly 64bit (which is, unfortunately, mostly what I use).

If this is interesting / new info I can do some bisecting across versions to get more details. If this is known / understood then I won't bother.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions