Open
Description
We need to find which classes we're worst at on the validation set (specifically not the test set). To do this we need to be able to visualise well (in an IPython notebook probably) for a given set of predictions on the test set (could save these in pickle or csv and load in for code that is agnostic to model). In the same notebook probably worth having Hinton diagrams for confusion matrices.
The idea with this is that we should be able to look at these difficult classes and work on some feature engineering (in the training set) to patch our model and slightly improve our score.