Skip to content

Weird bug with matrix multiplication on Array view #2821

@kishore-nori

Description

@kishore-nori

Describe the bug

I observe that the matrix multiplication with Array view returns wrong result, for majority of type combinations, with no clear pattern, I can think of. Following are the examples:

To reproduce

The Minimal Working Example (MWE) for this bug:

julia> z = CUDA.rand(2,2,1)
2×2×1 CuArray{Float32, 3, CUDA.DeviceMemory}:
[:, :, 1] =
 0.615746  0.619146
 0.585899  0.0667595

julia> zf = Float64.(z)
2×2×1 CuArray{Float64, 3, CUDA.DeviceMemory}:
[:, :, 1] =
 0.615746  0.619146
 0.585899  0.0667595

julia> rot = CuArray(([1.0f0 0.0f0; 0.0f0 1.0f0]))
2×2 CuArray{Float32, 2, CUDA.DeviceMemory}:
 1.0  0.0
 0.0  1.0

julia> rotf = Float64.(rot)
2×2 CuArray{Float64, 2, CUDA.DeviceMemory}:
 1.0  0.0
 0.0  1.0

julia> view(z,1,:,:)
2×1 view(::CuArray{Float32, 3, CUDA.DeviceMemory}, 1, :, :) with eltype Float32:
 0.61574584
 0.61914593

julia> view(zf,1,:,:)
2×1 view(::CuArray{Float64, 3, CUDA.DeviceMemory}, 1, :, :) with eltype Float64:
 0.6157458424568176
 0.619145929813385

# wrong result
julia> rot * view(z,1,:,:)
2×1 CuArray{Float32, 2, CUDA.DeviceMemory}:
 0.61574584
 0.5858987

# right result
julia> rotf * view(z,1,:,:)
2×1 CuArray{Float64, 2, CUDA.DeviceMemory}:
 0.6157458424568176
 0.619145929813385

# wrong result
julia> rot * view(zf,1,:,:)
2×1 CuArray{Float64, 2, CUDA.DeviceMemory}:
 0.6157458424568176
 0.5858986973762512

# wrong result
julia> rotf * view(zf,1,:,:)
2×1 CuArray{Float64, 2, CUDA.DeviceMemory}:
 0.6157458424568176
 0.5858986973762512

I am on [052768ef] CUDA v5.8.2

Version info

Details on Julia:

julia> versioninfo()

Julia Version 1.11.6
Commit 9615af0f269 (2025-07-09 12:58 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 96 × Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, cascadelake)
Threads: 1 default, 0 interactive, 1 GC (on 96 virtual cores)
Environment:
  JULIA_PKG_USE_CLI_GIT = true
  JULIA_DEPOT_PATH = /scratch/bt62/sn8885/.julia

Details on CUDA:

julia> CUDA.versioninfo()
CUDA runtime 12.9, artifact installation
CUDA driver 12.9
NVIDIA driver 570.124.6

CUDA libraries: 
- CUBLAS: 12.9.1
- CURAND: 10.3.10
- CUFFT: 11.4.1
- CUSOLVER: 11.7.5
- CUSPARSE: 12.5.10
- CUPTI: 2025.2.1 (API 28.0.0)
- NVML: 12.0.0+570.124.6

Julia packages: 
- CUDA: 5.8.2
- CUDA_Driver_jll: 0.13.1+0
- CUDA_Runtime_jll: 0.17.1+0

Toolchain:
- Julia: 1.11.6
- LLVM: 16.0.6

1 device:
  0: Tesla V100-SXM2-32GB (sm_70, 31.352 GiB / 32.000 GiB available)

Metadata

Metadata

Assignees

No one assigned

    Labels

    cuda librariesStuff about CUDA library wrappers.help wantedExtra attention is needed

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions