The loss doesn't decrease when using multi nodes.

When i use one node, the code runs well. However, when I use 2 nodes and set the batch_size to 64, the loss is always around 5.545 and doesn't decrease. As 5.545 is the value of ln(512), it seems like that the network never get new knowledge during training. I have checked that the parameters are not fixed. I think maybe there is something wrong with the GatherLayer but i can not find it out. Have you met this problem? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The loss doesn't decrease when using multi nodes. #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The loss doesn't decrease when using multi nodes. #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions