The cub functions implemented in PR #2068 update some Core State counters that are currently stored on the CPU, so synchronization is required to move these values back to the CPU. To reduce CPU synchronization, these counters should be maintained on the GPU.