Hi, thanks for your great open source work!
In your work, I noticed that you used audiocaps dataset to fine tune on tango-full ckpt,
can I know the command for your fine tuning process?
Do I need to modify the learning rate (default=3e-5) and do I use --hf_model or --resume_from_checkpoint in the command?
Looking forward to your reply, thanks again!😊