Natively support bfloat16 as a compute type in H2. On systems with GPUs, the representation should be transparently copyable between CPU and GPU.