Skip to content

Conversation

@pleroy
Copy link
Member

@pleroy pleroy commented Dec 17, 2025

Command:
.\Release\x64\nanobenchmarks.exe --benchmark_filter=.*Hermite.* --loop_iterations 100 --keep_perf_boost --keep_throttling

SSE results:

AuthenticAMD AMD Ryzen Threadripper PRO 5965WX 24-Cores      
Features: FPU SSE SSE2 SSE3 FMA SSE4_1 AVX AVX2
RAW TSC:                                                 min      1‰      1%      5%     10%     25%     50%
                                    identity            5.70   +0.00   +0.00   +0.38   +0.38   +0.38   +0.38
                             mulsd_xmm0_xmm0            7.60   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
                          mulsd_xmm0_xmm0_4x           14.82   +0.38   +0.38   +0.38   +0.38   +0.38   +0.38
                            sqrtps_xmm0_xmm0           16.72   +0.00   +0.00   +0.00   +0.38   +0.38   +0.38
Slope: 1.263245 cycle/TSC    Overhead: 5.470815 TSC
Correlation coefficient: 0.999061
Cycles:                                     expected     min      1‰      1%      5%     10%     25%     50%
R                                   identity       0    0.29   +0.00   +0.00   +0.48   +0.48   +0.48   +0.48
R                            mulsd_xmm0_xmm0       3    2.69   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
R                         mulsd_xmm0_xmm0_4x      12   12.29   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
R                           sqrtps_xmm0_xmm0      14   14.21   +0.00   +0.00   +0.00   +0.48   +0.48   +0.48
  Hermite3Nanobenchmark/ConstructionAndValue           92.46   +0.48   +0.48   +1.92   +2.40   +2.40   +2.40
                 Hermite3Nanobenchmark/Value           33.41   +0.48   +0.96   +0.96   +0.96   +0.96   +0.96
    Hermite3Nanobenchmark/ValueAndDerivative           34.85   +0.48   +0.48   +0.48   +0.48   +0.96   +0.96

AVX results:

AuthenticAMD AMD Ryzen Threadripper PRO 5965WX 24-Cores      
Features: FPU SSE SSE2 SSE3 FMA SSE4_1 AVX AVX2
RAW TSC:                                                 min      1‰      1%      5%     10%     25%     50%
                                    identity            4.94   +0.00   +0.00   +0.00   +0.00   +0.38   +0.38
                             mulsd_xmm0_xmm0            7.60   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
                          mulsd_xmm0_xmm0_4x           14.82   +0.38   +0.38   +0.38   +0.38   +0.38   +0.38
                            sqrtps_xmm0_xmm0           16.72   +0.00   +0.00   +0.00   +0.38   +0.38   +0.38
Slope: 1.204484 cycle/TSC    Overhead: 5.000825 TSC
Correlation coefficient: 0.999763
Cycles:                                     expected     min      1‰      1%      5%     10%     25%     50%
R                                   identity       0   -0.07   +0.00   +0.00   +0.00   +0.00   +0.46   +0.46
R                            mulsd_xmm0_xmm0       3    3.13   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
R                         mulsd_xmm0_xmm0_4x      12   12.28   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
R                           sqrtps_xmm0_xmm0      14   14.12   +0.00   +0.00   +0.00   +0.46   +0.46   +0.46
  Hermite3Nanobenchmark/ConstructionAndValue           92.38   +0.46   +0.46   +0.92   +0.92   +0.92   +0.92
                 Hermite3Nanobenchmark/Value           34.25   +0.00   +0.46   +0.46   +0.46   +0.46   +0.46
    Hermite3Nanobenchmark/ValueAndDerivative           37.46   +0.00   +0.46   +0.92   +0.92   +0.92   +0.92

@eggrobin eggrobin added the LGTM label Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants