Review unary sign kernel gpu kernel call as it have unnecessary logical branches for type conversion.

Review design of gpu_kernel in  sign_kernel_cuda as it may have unused logical branch which calls for conversion from any type to any type. It surfaced from work on BF16 whet it convertin uint8_t to BFloat16 from DynamicCast.h and TypeCast.

> sign_kernel_cuda from profile of lambda function does not need dynamic conversion from any to any type tensor, it needs to return value of same type.
>  Same issue does not pop up in other kernels, where dynamic cast could be needed. Looks like this kernel is special for no reason and can use more common kernel design behind `gpu_kernel`.
> Potential performance from load kernel size or usage more common, updated and supported kernel version.


```
void sign_kernel_cuda(TensorIteratorBase& iter){
  if (iter.dtype() == ScalarType::Bool) {
    gpu_kernel(iter, []GPU_LAMBDA(bool a){
      return a;
    });
  } else {
    AT_DISPATCH_ALL_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.dtype(), "sign_cuda", [&]() {
        gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t {
            return c10::signum(a);
        });
    });
  }
}

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review unary sign kernel gpu kernel call as it have unnecessary logical branches for type conversion. #3063

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Review unary sign kernel gpu kernel call as it have unnecessary logical branches for type conversion. #3063

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions