10us vs 4us kernels strategy1: - update faster kernel and check difference in registers using CodeXL strategy2: - profile and strip main kernel