Skip to content

Review unary sign kernel gpu kernel call as it have unnecessary logical branches for type conversion. #3063

@anatoliylitv

Description

@anatoliylitv

Review design of gpu_kernel in sign_kernel_cuda as it may have unused logical branch which calls for conversion from any type to any type. It surfaced from work on BF16 whet it convertin uint8_t to BFloat16 from DynamicCast.h and TypeCast.

sign_kernel_cuda from profile of lambda function does not need dynamic conversion from any to any type tensor, it needs to return value of same type.
Same issue does not pop up in other kernels, where dynamic cast could be needed. Looks like this kernel is special for no reason and can use more common kernel design behind gpu_kernel.
Potential performance from load kernel size or usage more common, updated and supported kernel version.

void sign_kernel_cuda(TensorIteratorBase& iter){
  if (iter.dtype() == ScalarType::Bool) {
    gpu_kernel(iter, []GPU_LAMBDA(bool a){
      return a;
    });
  } else {
    AT_DISPATCH_ALL_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.dtype(), "sign_cuda", [&]() {
        gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t {
            return c10::signum(a);
        });
    });
  }
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions