A Problem encountered when using data parallel #1540
Unanswered
Aziily
asked this question in
Community | Q&A
Replies: 1 comment
-
Did you set the config about 'zero'? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I think I may need someone to help me solve the problem

I am trying to use two nodes which each has a GPU, and I have written such config in config.py
while when I use colossalai run to start the distributed training, I get such WARNING

and by watching the logs of training, I believe it doesn't train a model, instead it starts training a model per process
Beta Was this translation helpful? Give feedback.
All reactions