Bug: incorrect metrics evaluating for step two

https://github.com/microsoft/DeepSpeedExamples/blob/ab4e2e54620d0e80ead128b30dd39d9d55751eab/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py#L263

Score and accuracy division should be out of for loop.