Could you explain the process for preparing this for a custom dataset? Id like to try it on shakespeare's plays