NPU双机多卡微调报HCCL错误

### Reminder

- [X] I have read the above rules and searched the existing issues.

### System Info

使用docker-npu的方式构建镜像和容器，在进行双机16卡微调qwen2.5-7B的时候一直报HCCL错误，在容器内执行的命令如下：
torchrun  --master_port 6001 --nproc_per_node=8 --nnodes=2 --node_rank=0  \
--master_addr=10.0.1.30 src/train.py \
--stage sft \
--model_name_or_path /home/model_bin/Qwen/Qwen2___5-7B-Instruct/  \
--do_train \
--dataset alpaca_zh_demo \
--template qwen \
--finetuning_type lora \
--output_dir  saves/qwen-7b/lora/sft \
--overwrite_cache \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 2 \
--lr_scheduler_type cosine \
--logging_steps 1 \
--save_steps 500 \
--learning_rate 1e-4 \
--num_train_epochs 100.0 \
--plot_loss 

### Reproduction

具体错误日志如下图所示：
![image](https://github.com/user-attachments/assets/42e4ce1a-70ab-44ef-941a-e8d2b5f1e958)
![image](https://github.com/user-attachments/assets/08749303-b5dc-4c8e-9de4-4da24441ddf5)



### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPU双机多卡微调报HCCL错误 #6646

Reminder

System Info

Reproduction

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development