-
Notifications
You must be signed in to change notification settings - Fork 264
Open
Labels
cuda librariesStuff about CUDA library wrappers.Stuff about CUDA library wrappers.help wantedExtra attention is neededExtra attention is needed
Description
Describe the bug
I observe that the matrix multiplication with Array view returns wrong result, for majority of type combinations, with no clear pattern, I can think of. Following are the examples:
To reproduce
The Minimal Working Example (MWE) for this bug:
julia> z = CUDA.rand(2,2,1)
2×2×1 CuArray{Float32, 3, CUDA.DeviceMemory}:
[:, :, 1] =
0.615746 0.619146
0.585899 0.0667595
julia> zf = Float64.(z)
2×2×1 CuArray{Float64, 3, CUDA.DeviceMemory}:
[:, :, 1] =
0.615746 0.619146
0.585899 0.0667595
julia> rot = CuArray(([1.0f0 0.0f0; 0.0f0 1.0f0]))
2×2 CuArray{Float32, 2, CUDA.DeviceMemory}:
1.0 0.0
0.0 1.0
julia> rotf = Float64.(rot)
2×2 CuArray{Float64, 2, CUDA.DeviceMemory}:
1.0 0.0
0.0 1.0
julia> view(z,1,:,:)
2×1 view(::CuArray{Float32, 3, CUDA.DeviceMemory}, 1, :, :) with eltype Float32:
0.61574584
0.61914593
julia> view(zf,1,:,:)
2×1 view(::CuArray{Float64, 3, CUDA.DeviceMemory}, 1, :, :) with eltype Float64:
0.6157458424568176
0.619145929813385
# wrong result
julia> rot * view(z,1,:,:)
2×1 CuArray{Float32, 2, CUDA.DeviceMemory}:
0.61574584
0.5858987
# right result
julia> rotf * view(z,1,:,:)
2×1 CuArray{Float64, 2, CUDA.DeviceMemory}:
0.6157458424568176
0.619145929813385
# wrong result
julia> rot * view(zf,1,:,:)
2×1 CuArray{Float64, 2, CUDA.DeviceMemory}:
0.6157458424568176
0.5858986973762512
# wrong result
julia> rotf * view(zf,1,:,:)
2×1 CuArray{Float64, 2, CUDA.DeviceMemory}:
0.6157458424568176
0.5858986973762512
I am on [052768ef] CUDA v5.8.2
Version info
Details on Julia:
julia> versioninfo()
Julia Version 1.11.6
Commit 9615af0f269 (2025-07-09 12:58 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 96 × Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, cascadelake)
Threads: 1 default, 0 interactive, 1 GC (on 96 virtual cores)
Environment:
JULIA_PKG_USE_CLI_GIT = true
JULIA_DEPOT_PATH = /scratch/bt62/sn8885/.julia
Details on CUDA:
julia> CUDA.versioninfo()
CUDA runtime 12.9, artifact installation
CUDA driver 12.9
NVIDIA driver 570.124.6
CUDA libraries:
- CUBLAS: 12.9.1
- CURAND: 10.3.10
- CUFFT: 11.4.1
- CUSOLVER: 11.7.5
- CUSPARSE: 12.5.10
- CUPTI: 2025.2.1 (API 28.0.0)
- NVML: 12.0.0+570.124.6
Julia packages:
- CUDA: 5.8.2
- CUDA_Driver_jll: 0.13.1+0
- CUDA_Runtime_jll: 0.17.1+0
Toolchain:
- Julia: 1.11.6
- LLVM: 16.0.6
1 device:
0: Tesla V100-SXM2-32GB (sm_70, 31.352 GiB / 32.000 GiB available)
Metadata
Metadata
Assignees
Labels
cuda librariesStuff about CUDA library wrappers.Stuff about CUDA library wrappers.help wantedExtra attention is neededExtra attention is needed