Why choose to use the cooking subset from Ego4D to train in the caption module?

This is very meaningful work, but I have a question: Why did you choose to train the caption module on a subset? Thank you for your response.