Skip to content

Fixes #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 35 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,48 @@
# Misclassification_detector
# misclass

This is a tool for getting misclassified data for fasttext.
**misclass** is a tool for understanding fastText classification model prediction errors.

Often when we train and test a model, we only look at the accuracy. Seeing actual examples of misclassified rows, and the specific segments

## Overview

This script can be used for diagnosting reasons of misclassyifing data with fasttext. You can train your model with fasttext and pass it to this tool. After that you can see misclassified test data with word n-grams. This will help you to understand reasons of misclassifying. You are free to choose n-gram size and steps.

There are 2 parameters for customizing diagnosting
### Example

```
--step
```
Consider a model trained to do a simple classification task like sentiment analysis. The model makes an error, it incorrectly predicts the following Amazon review is `negative` when the correct label `positive`.

and
> I bought this for my friend who plays the piano. Honestly I was not expecting much because of some bad experiences with these types of products in the past. In the end though this one was definitely worth the money and I even ended up buying another one!

misclass can highlight the segments whose predicted label did not match the correct label and thus caused the prediction for the whole row to fail.

> I bought this for my friend who plays the piano. Honestly I was **not expecting much** because of some **bad experiences** with these types of products in the past. In the end though this one was definitely worth the money and I even ended up buying another one!

misclass does this by iterating through each segment and checking it against the label. fastText predictions are calculated by averaging the predictions across all segments.

```
--countOfWords
```


## Parameters
There are parameters that control the way that the row is divided into segments for highlighting.

`step`:

--testPath: Path of test data.
`countOfWords`:


## Running

The full set of parameters required by misclass.py is:

`testPath`: Path to the test data.

`modelPath`: Path to the trained model to evaluate.

--modelPath: Path of trained model.
`output`: Path to which to write a file with the misclass output

--output: Path of output file where will be stored result of script.
`countOfWords`: Count of words in test sentence.

--countOfWords: Count of words in test sentence.
`step`: Step for taking words in test sentence.

--step: Step for taking words in test sentence.
For example:

```
python misclass.py --testPath test.txt --modelPath model --output misclass.txt --countOfWords 5 --step 3
```

File renamed without changes.