Hi, I've spent some time reading the code,
In this project, there is no direct implementation towards the division operator.
But I found there is a similar CPU implementation in src/cpu/kernels.cc
template <CpuIsa ISA, typename T>
void rcp(const T* x, T* y, dim_t size) {
vectorized_unary_transform<ISA>(x, y, size, Vec<T, ISA>::rcp);
}
which is defined in src/cpu/vec.h:
static inline value_type rcp(value_type a) {
return static_cast<T>(1) / a;
}
However, I can not find any similar implementation for GPU operation. Did I miss anything? Or, if there's no such implementation in the GPU, what's the recommended way to produce it?
Sincerely,