Skip to content

Support Julia 1.13#3020

Open
eschnett wants to merge 7 commits intoJuliaGPU:masterfrom
eschnett:eschnett/julia-1.13
Open

Support Julia 1.13#3020
eschnett wants to merge 7 commits intoJuliaGPU:masterfrom
eschnett:eschnett/julia-1.13

Conversation

@eschnett
Copy link
Contributor

Closes #3019.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 4c8e4e2 Previous: e2eab84 Ratio
latency/precompile 44846190635.5 ns 55351775865 ns 0.81
latency/ttfp 14453150957 ns 7675959782 ns 1.88
latency/import 4520087896 ns 4025008043 ns 1.12
integration/volumerhs 9443988.5 ns 9625265.5 ns 0.98
integration/byval/slices=1 145635.5 ns 147037 ns 0.99
integration/byval/slices=3 422956.5 ns 426089 ns 0.99
integration/byval/reference 143817 ns 144983 ns 0.99
integration/byval/slices=2 284679 ns 286492 ns 0.99
integration/cudadevrt 102555 ns 103531 ns 0.99
kernel/indexing 13575 ns 14161 ns 0.96
kernel/indexing_checked 14160 ns 15039 ns 0.94
kernel/occupancy 683.5454545454545 ns 792.6237623762377 ns 0.86
kernel/launch 2105.1 ns 2304.777777777778 ns 0.91
kernel/rand 14507 ns 18605.5 ns 0.78
array/reverse/1d 19160 ns 19692 ns 0.97
array/reverse/2dL_inplace 66497 ns 66886 ns 0.99
array/reverse/1dL 69339 ns 69816 ns 0.99
array/reverse/2d 21029.5 ns 21785 ns 0.97
array/reverse/1d_inplace 10596.666666666666 ns 9783 ns 1.08
array/reverse/2d_inplace 11915 ns 13411 ns 0.89
array/reverse/2dL 73024.5 ns 73680 ns 0.99
array/reverse/1dL_inplace 66420 ns 66877 ns 0.99
array/copy 18483 ns 20494 ns 0.90
array/iteration/findall/int 146058 ns 159432 ns 0.92
array/iteration/findall/bool 130524 ns 141132 ns 0.92
array/iteration/findfirst/int 84706 ns 161404 ns 0.52
array/iteration/findfirst/bool 82020 ns 162399 ns 0.51
array/iteration/scalar 65977.5 ns 73008 ns 0.90
array/iteration/logical 198658 ns 220542 ns 0.90
array/iteration/findmin/1d 82559 ns 94388 ns 0.87
array/iteration/findmin/2d 116988 ns 121456 ns 0.96
array/reductions/reduce/Int64/1d 39212 ns 43817 ns 0.89
array/reductions/reduce/Int64/dims=1 42352 ns 44722 ns 0.95
array/reductions/reduce/Int64/dims=2 59082 ns 61524.5 ns 0.96
array/reductions/reduce/Int64/dims=1L 87201 ns 88932 ns 0.98
array/reductions/reduce/Int64/dims=2L 84775.5 ns 88232 ns 0.96
array/reductions/reduce/Float32/1d 34234.5 ns 37401.5 ns 0.92
array/reductions/reduce/Float32/dims=1 40281.5 ns 51945.5 ns 0.78
array/reductions/reduce/Float32/dims=2 56532.5 ns 59868 ns 0.94
array/reductions/reduce/Float32/dims=1L 51686 ns 52569 ns 0.98
array/reductions/reduce/Float32/dims=2L 70130.5 ns 72250 ns 0.97
array/reductions/mapreduce/Int64/1d 39390 ns 43757 ns 0.90
array/reductions/mapreduce/Int64/dims=1 49646.5 ns 51077 ns 0.97
array/reductions/mapreduce/Int64/dims=2 59329 ns 61716 ns 0.96
array/reductions/mapreduce/Int64/dims=1L 87259 ns 89026 ns 0.98
array/reductions/mapreduce/Int64/dims=2L 84900.5 ns 88266 ns 0.96
array/reductions/mapreduce/Float32/1d 34051 ns 37130 ns 0.92
array/reductions/mapreduce/Float32/dims=1 45578 ns 42011 ns 1.08
array/reductions/mapreduce/Float32/dims=2 56455 ns 60008 ns 0.94
array/reductions/mapreduce/Float32/dims=1L 51769 ns 52656 ns 0.98
array/reductions/mapreduce/Float32/dims=2L 69321 ns 72333.5 ns 0.96
array/broadcast 20561 ns 20154 ns 1.02
array/copyto!/gpu_to_gpu 10673.166666666668 ns 12828 ns 0.83
array/copyto!/cpu_to_gpu 218524 ns 216999 ns 1.01
array/copyto!/gpu_to_cpu 284623 ns 282452 ns 1.01
array/accumulate/Int64/1d 118384 ns 125219 ns 0.95
array/accumulate/Int64/dims=1 79371 ns 88138 ns 0.90
array/accumulate/Int64/dims=2 155465 ns 162165 ns 0.96
array/accumulate/Int64/dims=1L 1694474 ns 1714037 ns 0.99
array/accumulate/Int64/dims=2L 960497 ns 971078.5 ns 0.99
array/accumulate/Float32/1d 100226 ns 109882.5 ns 0.91
array/accumulate/Float32/dims=1 76031 ns 84424 ns 0.90
array/accumulate/Float32/dims=2 144406.5 ns 151744.5 ns 0.95
array/accumulate/Float32/dims=1L 1585870 ns 1622762 ns 0.98
array/accumulate/Float32/dims=2L 656780.5 ns 702590.5 ns 0.93
array/construct 1265.3 ns 1267.85 ns 1.00
array/random/randn/Float32 36574 ns 48207 ns 0.76
array/random/randn!/Float32 30409 ns 25000 ns 1.22
array/random/rand!/Int64 34484 ns 27295 ns 1.26
array/random/rand!/Float32 8136.75 ns 8737.333333333334 ns 0.93
array/random/rand/Int64 36946 ns 30000 ns 1.23
array/random/rand/Float32 12492 ns 13155 ns 0.95
array/permutedims/4d 51599 ns 55023 ns 0.94
array/permutedims/2d 52539.5 ns 53878 ns 0.98
array/permutedims/3d 52918 ns 54959 ns 0.96
array/sorting/1d 2736673 ns 2758315 ns 0.99
array/sorting/by 3306442 ns 3344753.5 ns 0.99
array/sorting/2d 1068592 ns 1081270 ns 0.99
cuda/synchronization/stream/auto 974.5 ns 1017.0833333333334 ns 0.96
cuda/synchronization/stream/nonblocking 6809.700000000001 ns 7295.6 ns 0.93
cuda/synchronization/stream/blocking 816.5108695652174 ns 798.6601941747573 ns 1.02
cuda/synchronization/context/auto 1159.2 ns 1158.8 ns 1.00
cuda/synchronization/context/nonblocking 7289.8 ns 7658.9 ns 0.95
cuda/synchronization/context/blocking 894.8541666666666 ns 902.5217391304348 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@eschnett
Copy link
Contributor Author

The self-tests fail because the linear algebra functions (e.g. matrix exponential) as implemented in LinearAlgebra use scalar iteration. See e.g. exp! in https://github.com/JuliaLang/LinearAlgebra.jl/blob/f55e4736fb6dce08fee8a7ac7f0aba1f2b54838e/src/dense.jl#L784.

How should this be handled? Rewrite exp!? Find a respective CUDA library function to call and add a new method to exp? Fall back to the Julia 1.12 implementation? How does this work in Julia 1.12?

@eschnett
Copy link
Contributor Author

I think it's JuliaGPU/GPUArrays.jl#679.

@eschnett
Copy link
Contributor Author

eschnett commented Feb 3, 2026

The buildkite error is

  ptxas /tmp/jl_PALmvKnqta.ptx, line 226; error   : Modifier '.NaN' requires .target sm_80 or higher
  ptxas /tmp/jl_PALmvKnqta.ptx, line 226; error   : Feature 'min.f16 or min.f16x2' requires .target sm_80 or higher

This seems unrelated to my changes, except that I am now running CI tests on Julia 1.12 and Julia 1.13...

@maleadt
Copy link
Member

maleadt commented Feb 4, 2026

I guess #3025 needs to be active for all LLVM versions.

@eschnett
Copy link
Contributor Author

eschnett commented Feb 4, 2026

Good news: CUDA.jl now works for Julia 1.12.
Bad news: There's an LLVM segfault for Julia 1.13.

�_bk;t=1770145810814�      From worker 5:	[271397] signal 11 (1): Segmentation fault
�_bk;t=1770145810814�      From worker 5:	in expression starting at /var/lib/buildkite-agent/builds/gpuci-9/julialang/cuda-dot-jl/test/base/texture.jl:41
�_bk;t=1770145810920�      From worker 5:	_ZN12_GLOBAL__N_124NVPTXReplaceImageHandles18findIndexForHandleERN4llvm14MachineOperandERNS1_15MachineFunctionERj.isra.0 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810920�      From worker 5:	_ZN12_GLOBAL__N_124NVPTXReplaceImageHandles18findIndexForHandleERN4llvm14MachineOperandERNS1_15MachineFunctionERj.isra.0 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810920�      From worker 5:	_ZN12_GLOBAL__N_124NVPTXReplaceImageHandles20runOnMachineFunctionERN4llvm15MachineFunctionE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810921�      From worker 5:	_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE.part.0 at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810921�      From worker 5:	_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810921�      From worker 5:	_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810922�      From worker 5:	_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)
�_bk;t=1770145810922�      From worker 5:	_ZL21LLVMTargetMachineEmitP23LLVMOpaqueTargetMachineP16LLVMOpaqueModuleRN4llvm17raw_pwrite_streamE19LLVMCodeGenFileTypePPc at /root/.cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.13/julia-1.13-latest-linux-x86_64/bin/../lib/julia/libLLVM.so.20.1jl (unknown line)

@eschnett
Copy link
Contributor Author

eschnett commented Feb 4, 2026

I think it's texture interpolation that is broken on 1.13. This line segfaults LLVM:

dst[i] = texture[u]

in test/base/texture.jl (function kernel_texture_warp_native).

@eschnett
Copy link
Contributor Author

eschnett commented Feb 4, 2026

We will need to update KernelAbstractions.jl as well JuliaGPU/KernelAbstractions.jl#679.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot load CUDA.jl with Julia 1.13

3 participants