Classification for small datasets #5
Description
Hello,
First of all thank you for this awesome contribution to the scientific world.
I performed some tests with binary classification on the corpus with 5.000 samples and the result was not good ( 0.65). Even with a BoW Naive Bayes classifier I could get higher scores.
I tried to play with some parameters like epoch and minCount and it improved the results very slight.
At the fastText Hacker News thread seems like a developer is aware about this issue:
Thanks for pointing this out. We design this library on large datasets and some static variables may not be well tuned for smaller ones. For example the learning rate is only updated every 10k words. We are fixing that now, could you please send us on which dataset you were testing? We would like to see if we have solved this.
There is something inherently to the algorithm that makes this performant only with big datasets ? There are any variable that we can tune for this use case ?