amend drop duplicate behaviour in starter notebook#5
amend drop duplicate behaviour in starter notebook#5Ari-Ramkilowan wants to merge 2 commits intomasakhane-io:masterfrom
Conversation
…target text are duplicates
|
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
|
Let's hold off on merging this one until we've discussed @dwhitena's ideas I think that taking 100 of the duplicates and getting a isiZulu/isiXhosa speaker to review them would be ideal. We work with an amazing isiZulu linguist if you'd like an expert to check if the duplicates are valid translations. Let me know and I'll do an intro email! |
|
@jaderabbit and @Ari-Ramkilowan, thanks for the PR and discussion here. My ideas are the following: If we are able to get human review of the conflicting translations, that would be ideal. @jaderabbit might know how feasible this is, but it seems like it may be possible based on the above comments. If we can't get human supervision, we try something like Any other thoughts? |
|
Thanks for adding the additional files @Ari-Ramkilowan! Do you have a link to the checkpoint of a trained model? |
Changed drop duplicate behaviour to remove rows only when source AND target text are duplicates. Allowing for instances when source text may have multiple valid translations