-
Notifications
You must be signed in to change notification settings - Fork 267
Description
For anyone who has gotten MuLan training working with their own moderately sized dataset (1M+ music/text pairs):
Is the Mulan trainer training loss not decreasing and just hovering over time? Or do you actually observe a decrease? For reference, using the training script from here. Below is a subset of the loss over time; this is with a large batch size (512), and loss just hovers at ~6.236 for all steps.
1: loss: 6.446783065795898 2: loss: 6.2690324783325195 3: loss: 6.256021499633789 4: loss: 6.239367485046387 5: loss: 6.237820625305176 6: loss: 6.238517761230469 7: loss: 6.243525505065918 8: loss: 6.236880779266357 9: loss: 6.240113735198975 10: loss: 6.237149238586426 11: loss: 6.237539768218994 12: loss: 6.23846960067749 13: loss: 6.238637447357178 14: loss: 6.2371826171875 15: loss: 6.236910820007324 16: loss: 6.236763954162598 17: loss: 6.236742973327637 18: loss: 6.236521244049072 19: loss: 6.2365617752075195 20: loss: 6.236545562744141 21: loss: 6.236554145812988 22: loss: 6.236541271209717 23: loss: 6.23670768737793 24: loss: 6.23667049407959 25: loss: 6.236469268798828 26: loss: 6.236410617828369 27: loss: 6.236483573913574 28: loss: 6.236449241638184 29: loss: 6.236398696899414 30: loss: 6.236471176147461 31: loss: 6.236447334289551 32: loss: 6.236616611480713 33: loss: 6.236435413360596 34: loss: 6.2363996505737305 35: loss: 6.236429214477539 36: loss: 6.23640251159668 37: loss: 6.236411094665527 38: loss: 6.236398696899414 39: loss: 6.2364654541015625 40: loss: 6.236388206481934 41: loss: 6.236456871032715 42: loss: 6.236383438110352 43: loss: 6.236384391784668 44: loss: 6.236380577087402 45: loss: 6.236428260803223