About the way to generate 99,121 synthetic prompts from TGRT Self-Instruct #13
Description
Hello,Thanks for your awesome work and code!
However, I encountered some confusion while trying to understand how you generated TGRT Self Instruction. You mentioned in the article that you first handwrite 20 instruction types and then generated some topics from these types. Finally, instructions were generated by the “instruction type - topic" pair.
Therefore, my first question is:
How many topics have you generated with each instruction type? I see in Appendix G that your prompt generates 10 topics for each instruction type.
My second question is :
How many instructions will be generated for each "instruction type - topic" pair? Because you finally get 99,121 synthetic prompts from TGRT Self-Instruct, if every "instruction type - topic" pair generates only one instruction, does it mean you at least generate 99,121 topics?
Thanks a lot for your help!