Skip to content
This repository was archived by the owner on Mar 19, 2024. It is now read-only.
This repository was archived by the owner on Mar 19, 2024. It is now read-only.

Classification for small datasets #5

Open
@bratao

Description

@bratao

Hello,
First of all thank you for this awesome contribution to the scientific world.

I performed some tests with binary classification on the corpus with 5.000 samples and the result was not good ( 0.65). Even with a BoW Naive Bayes classifier I could get higher scores.

I tried to play with some parameters like epoch and minCount and it improved the results very slight.

At the fastText Hacker News thread seems like a developer is aware about this issue:

Thanks for pointing this out. We design this library on large datasets and some static variables may not be well tuned for smaller ones. For example the learning rate is only updated every 10k words. We are fixing that now, could you please send us on which dataset you were testing? We would like to see if we have solved this.

There is something inherently to the algorithm that makes this performant only with big datasets ? There are any variable that we can tune for this use case ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions