-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Thank you for creating this wonderful library!
I compared the performance to other similar libraries/programs and it is significantly faster for larger 128-bit integers. For example, compared to GNU factor, it is over 330× faster:
$ ./time.sh -i -r 5 "seq $(bc <<<'(2^127) - (10^2) - 1') $(bc <<<'(2^127) - 1') | "{factor,./gnu/factor,./factoring/gcc_example,./factoring/clang_example}
Benchmark #1: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | factor
Time (mean ± σ std dev): 143.1562s ± 0.5082s [User: 143.1556s, System: 0.0000s]
Range (min … median … max): 142.371s … 143.422s … 143.732s CPU: 100.0%, 5 runs
Benchmark #2: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./gnu/factor
Time (mean ± σ std dev): 142.3252s ± 0.7053s [User: 141.9832s, System: 0.0060s]
Range (min … median … max): 141.651s … 141.996s … 143.588s CPU: 99.8%, 5 runs
Benchmark #3: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/gcc_example
Time (mean ± σ std dev): 0.9602s ± 0.7591s [User: 0.5456s, System: 0.0080s]
Range (min … median … max): 0.507s … 0.564s … 2.473s CPU: 57.7%, 5 runs
Benchmark #4: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/clang_example
Time (mean ± σ std dev): 0.4254s ± 0.1519s [User: 0.3314s, System: 0.0080s]
Range (min … median … max): 0.312s … 0.356s … 0.719s CPU: 79.8%, 5 runs
Summary
#4 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/clang_example’ ran
336.521 ± 120.198 times (33,552.1%) faster than #1 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | factor’
334.568 ± 119.506 times (33,356.8%) faster than #2 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./gnu/factor’
2.257 ± 1.958 times (125.7%) faster than #3 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/gcc_example’
For a couple of 128-bit integers with large 64-bit factors (from the GNU test suite), it is almost 2,000× faster:
$ ./time.sh -i -m 2 "echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | "{factor,./gnu/factor,./factoring/gcc_example,./factoring/clang_example}
Benchmark #1: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | factor
Time (mean ± σ std dev): 319.4225s ± 0.3625s [User: 319.4225s, System: 0.0000s]
Range (min … median … max): 319.060s … 319.422s … 319.785s CPU: 100.0%, 2 runs
Benchmark #2: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./gnu/factor
Time (mean ± σ std dev): 318.5655s ± 0.4765s [User: 317.7500s, System: 0.0005s]
Range (min … median … max): 318.089s … 318.566s … 319.042s CPU: 99.7%, 2 runs
Benchmark #3: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/gcc_example
Time (mean ± σ std dev): 1.1580s ± 0.8340s [User: 0.2590s, System: 0.0340s]
Range (min … median … max): 0.324s … 1.158s … 1.992s CPU: 25.3%, 2 runs
Benchmark #4: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/clang_example
Time (mean ± σ std dev): 0.1634s ± 0.0169s [User: 0.1365s, System: 0.0059s]
Range (min … median … max): 0.152s … 0.158s … 0.219s CPU: 87.2%, 13 runs
Summary
#4 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/clang_example’ ran
1,955.034 ± 202.348 times (195,403.4%) faster than #1 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | factor’
1,949.789 ± 201.814 times (194,878.9%) faster than #2 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./gnu/factor’
7.088 ± 5.157 times (608.8%) faster than #3 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/gcc_example’
However, for smaller integers under 32-bits, it seems to be up to 11× slower than GNU factor:
$ ./time.sh -i -r 5 'seq 2 10000000 | '{factor,./gnu/factor,./uu/factor,./factoring/gcc_example,./factoring/clang_example,'./numbers -p'}
Benchmark #1: seq 2 10000000 | factor
Time (mean ± σ std dev): 2.2944s ± 0.0112s [User: 2.0142s, System: 0.4204s]
Range (min … median … max): 2.281s … 2.291s … 2.313s CPU: 106.1%, 5 runs
Benchmark #2: seq 2 10000000 | ./gnu/factor
Time (mean ± σ std dev): 2.2020s ± 0.0087s [User: 1.8820s, System: 0.4534s]
Range (min … median … max): 2.188s … 2.201s … 2.213s CPU: 106.1%, 5 runs
Benchmark #3: seq 2 10000000 | ./uu/factor
Time (mean ± σ std dev): 12.6390s ± 0.0296s [User: 10.2822s, System: 2.4632s]
Range (min … median … max): 12.591s … 12.637s … 12.684s CPU: 100.8%, 5 runs
Benchmark #4: seq 2 10000000 | ./factoring/gcc_example
Time (mean ± σ std dev): 15.2668s ± 0.1457s [User: 12.6782s, System: 2.7610s]
Range (min … median … max): 15.093s … 15.291s … 15.512s CPU: 101.1%, 5 runs
Benchmark #5: seq 2 10000000 | ./factoring/clang_example
Time (mean ± σ std dev): 15.2658s ± 0.0899s [User: 12.7338s, System: 2.7064s]
Range (min … median … max): 15.172s … 15.212s … 15.419s CPU: 101.1%, 5 runs
Benchmark #6: seq 2 10000000 | ./numbers -p
Time (mean ± σ std dev): 19.5844s ± 0.2708s [User: 17.0100s, System: 2.7530s]
Range (min … median … max): 19.241s … 19.561s … 20.072s CPU: 100.9%, 5 runs
Summary
#2 ‘seq 2 10000000 | ./gnu/factor’ ran
1.042 ± 0.007 times (4.2%) faster than #1 ‘seq 2 10000000 | factor’
5.740 ± 0.026 times (474.0%) faster than #3 ‘seq 2 10000000 | ./uu/factor’
6.933 ± 0.072 times (593.3%) faster than #4 ‘seq 2 10000000 | ./factoring/gcc_example’
6.933 ± 0.049 times (593.3%) faster than #5 ‘seq 2 10000000 | ./factoring/clang_example’
8.894 ± 0.128 times (789.4%) faster than #6 ‘seq 2 10000000 | ./numbers -p’
It seems to get progressively faster as the integers get larger. For integers less than 104, it is about 10.9× slower, for 105 about 8.1× slower, for 106 about 7.2× slower, for 107 about 6.9× slower and for 108 about 4.6× slower.
The above benchmarks were created with my Benchmarking Tool and include the system GNU factor (factor), a locally compiled version of GNU factor with -O3 and LTO enabled (./gnu/factor), uutils' Rust port of GNU factor (./uu/factor), a simple wrapper program for this library adapted from your example without CMake compiled with both GCC (./factoring/gcc_example) and Clang (./factoring/clang_example) and with inline ASM enabled, and lastly my simple Numbers Tool program which includes a partial C++ port of GNU factor (./numbers -p).
Hopefully I am using your library correctly to get maximum performance. For reference, the wrapper program I wrote for your library is available here: https://gist.github.com/tdulcet/6393865769644aaf9244561f3eb4e9c2. The commands I used to compile and run it are on top of the file. I used GCC 11.4 and Clang 14.0. Note that Clang did generate 42 compiler warnings, but in my benchmarking it was always faster than GCC with your library.