Our benchmarking suite right now is incredibly unreliable right now - there's too much variation between runs to make it useful. I'm not sure what the right option here is, but we should try to have something more stable.