Skip to content

add fastmath flag #2732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

add fastmath flag #2732

wants to merge 1 commit into from

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Apr 9, 2025

No description provided.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 6c0345c Previous: 57e06f9 Ratio
latency/precompile 45661499477.5 ns 45680313076.5 ns 1.00
latency/ttfp 6644709519 ns 6618514052 ns 1.00
latency/import 3165546495.5 ns 3159190068.5 ns 1.00
integration/volumerhs 9623672.5 ns 9615523 ns 1.00
integration/byval/slices=1 147339 ns 147042 ns 1.00
integration/byval/slices=3 425644 ns 425857 ns 1.00
integration/byval/reference 145349.5 ns 145233.5 ns 1.00
integration/byval/slices=2 286663 ns 286656.5 ns 1.00
integration/cudadevrt 103791 ns 103621 ns 1.00
kernel/indexing 14663 ns 14335 ns 1.02
kernel/indexing_checked 15282 ns 15041 ns 1.02
kernel/occupancy 674.2594936708861 ns 672.879746835443 ns 1.00
kernel/launch 2313.6666666666665 ns 2180 ns 1.06
kernel/rand 17361 ns 15098.5 ns 1.15
array/reverse/1d 20002 ns 20194 ns 0.99
array/reverse/2d 25387 ns 24780 ns 1.02
array/reverse/1d_inplace 11562 ns 10893 ns 1.06
array/reverse/2d_inplace 13306 ns 13342 ns 1.00
array/copy 20937 ns 21420 ns 0.98
array/iteration/findall/int 159683 ns 161487 ns 0.99
array/iteration/findall/bool 139460.5 ns 141184 ns 0.99
array/iteration/findfirst/int 154725 ns 155481.5 ns 1.00
array/iteration/findfirst/bool 155802 ns 156277 ns 1.00
array/iteration/scalar 73427 ns 74379 ns 0.99
array/iteration/logical 218733.5 ns 221595 ns 0.99
array/iteration/findmin/1d 42194 ns 42443 ns 0.99
array/iteration/findmin/2d 95047 ns 95456.5 ns 1.00
array/reductions/reduce/1d 43278.5 ns 37041 ns 1.17
array/reductions/reduce/2d 48985.5 ns 51842 ns 0.94
array/reductions/mapreduce/1d 41489 ns 35370 ns 1.17
array/reductions/mapreduce/2d 42242.5 ns 41509 ns 1.02
array/broadcast 21528 ns 21339.5 ns 1.01
array/copyto!/gpu_to_gpu 13762 ns 13807 ns 1.00
array/copyto!/cpu_to_gpu 209543 ns 211351 ns 0.99
array/copyto!/gpu_to_cpu 243591 ns 245693 ns 0.99
array/accumulate/1d 109500 ns 110163 ns 0.99
array/accumulate/2d 82231.5 ns 81026 ns 1.01
array/construct 1259.6 ns 1272.2 ns 0.99
array/random/randn/Float32 45832.5 ns 45868 ns 1.00
array/random/randn!/Float32 26472 ns 27161 ns 0.97
array/random/rand!/Int64 27426 ns 27291 ns 1.00
array/random/rand!/Float32 8798.666666666666 ns 8809.666666666666 ns 1.00
array/random/rand/Int64 30177 ns 30432 ns 0.99
array/random/rand/Float32 13263 ns 13379 ns 0.99
array/permutedims/4d 61834 ns 62223 ns 0.99
array/permutedims/2d 56175 ns 56419 ns 1.00
array/permutedims/3d 57143.5 ns 57054.5 ns 1.00
array/sorting/1d 2777303 ns 2767302 ns 1.00
array/sorting/by 3368494.5 ns 3355332 ns 1.00
array/sorting/2d 1085912 ns 1083060 ns 1.00
cuda/synchronization/stream/auto 1026.6 ns 1096.1 ns 0.94
cuda/synchronization/stream/nonblocking 6544.6 ns 6533.2 ns 1.00
cuda/synchronization/stream/blocking 800.8686868686868 ns 865.6755319148936 ns 0.93
cuda/synchronization/context/auto 1166.1 ns 1242.7 ns 0.94
cuda/synchronization/context/nonblocking 6772.4 ns 6746.8 ns 1.00
cuda/synchronization/context/blocking 948.25 ns 983.9 ns 0.96

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant