Skip to content

Performance regression using ^ #193

Open
@luraess

Description

@luraess

As reported in luraess/JuliaGPUPerf#2 and luraess/JuliaGPUPerf#3, there is an issue significantly affecting performance when doing ^ operation within GPU kernels.

The Int32 on Int32 case (luraess/JuliaGPUPerf#2) may have been fixed as upon suggestion from @vchuravy by using

my_pow(x, p) = ccall("llvm.powi.f32.i32", llvmcall, Float32, (Float32, Int32), x, p)
#[...]
A[ix,iy] = B[ix,iy] + s*my_pow(C[ix,iy], pow_int)

But the Float32 and Float64 cases are still lacking behind.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions