Description
I was using script from step3_rlhf_finetuning/training_scripts/single_node/run_6.7b.sh, I met some errors.
I used 7B Llama models as actor and critic respectively and set enable_hybrid_engine argument, I got errors like below:
│ /root/miniconda3/envs/coati/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py:398 │
│ in step │
│ │
│ 395 │ │
│ 396 │ def step(self, lr_kwargs=None): │
│ 397 │ │ super().step(lr_kwargs=lr_kwargs) │
│ ❱ 398 │ │ if(self._inference_containers[0].module.attention.attn_qkvw is not None and \ │
│ 399 │ │ │ self._inference_containers[0].q_k_v is not None): │
│ 400 │ │ │ for inference_container in self._inference_containers: │
│ 401 │ │ │ │ inference_container.reset_qkv() │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range
How can I solve this issue?
Thx : )