BFloat16.jl support in kernels

Julia 1.11 introduces BFloat16 codegen support, so let's use this issue to track support for that.

Right now, it looks like we support the type, but somehow still emit conversions:

```julia-repl
julia> BFloat16s.llvm_storage
true

julia> BFloat16s.llvm_arithmetic
true

julia> function kernel(x)
       @inbounds x[threadIdx().x] += BFloat16(1)
         return
       end

julia> x = CuArray{BFloat16}(undef, 1024);

julia> @device_code_llvm debuginfo=:none @cuda kernel(x)
; PTX CompilerJob of MethodInstance for kernel(::CuDeviceVector{BFloat16, 1}) for sm_89
define ptx_kernel void @_Z6kernel13CuDeviceArrayI8BFloat16Li1ELi1EE({ i64, i32 } %state, { i8 addrspace(1)*, i64, [1 x i64], i64 } %0) local_unnamed_addr {
conversion:
  %.fca.0.extract = extractvalue { i8 addrspace(1)*, i64, [1 x i64], i64 } %0, 0
  %1 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
  %2 = bitcast i8 addrspace(1)* %.fca.0.extract to bfloat addrspace(1)*
  %3 = zext i32 %1 to i64
  %4 = getelementptr inbounds bfloat, bfloat addrspace(1)* %2, i64 %3
  %5 = load bfloat, bfloat addrspace(1)* %4, align 2
  %6 = fpext bfloat %5 to float
  %7 = fadd float %6, 1.000000e+00
  %8 = fptrunc float %7 to bfloat
  store bfloat %8, bfloat addrspace(1)* %4, align 2
  ret void
}
```

In addition, the logic in BFloat16s.jl isn't great, as we determine support based on the host processor. It's not clear if we can do better though; this looks a lot like the literal `Int` issue (where we can't make GPU code use `Int32` when the host is `Int64`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BFloat16.jl support in kernels #2441

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BFloat16.jl support in kernels #2441

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions