When running Stage-3 scripts with enable_hybrid_engine encountered errors

I was using script from step3_rlhf_finetuning/training_scripts/single_node/run_6.7b.sh, I met some errors.
I used 7B Llama models as actor and critic respectively and set enable_hybrid_engine argument, I got errors like below:

│ /root/miniconda3/envs/coati/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py:398  │
│ in step                                                                                          │
│                                                                                                  │
│   395 │                                                                                          │
│   396 │   def step(self, lr_kwargs=None):                                                        │
│   397 │   │   super().step(lr_kwargs=lr_kwargs)                                                  │
│ ❱ 398 │   │   if(self._inference_containers[0].module.attention.attn_qkvw is not None and \      │
│   399 │   │   │   self._inference_containers[0].q_k_v is not None):                              │
│   400 │   │   │   for inference_container in self._inference_containers:                         │
│   401 │   │   │   │   inference_container.reset_qkv()                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range

How can I solve this issue?
Thx : )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When running Stage-3 scripts with enable_hybrid_engine encountered errors #373

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When running Stage-3 scripts with enable_hybrid_engine encountered errors #373

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions