Skip to content
Alejandro Valverde Mahou edited this page Nov 26, 2021 · 10 revisions

Welcome to the Animal Sound wiki!

In here, we'll put complementary stuff about the project, such as ideas, order of operations or how to use the model.

Mind map

This is our first approach for a mind map

Mind map

We plan on only doing the classification part for this project, but a good addition after would to try to do the clustering and generation models, as they could give more value to the solution.

Introduction

The classification of an animal based on its sound is a complex task for a computer to do, specially if the sound has noise or the differences in sound are very small. Deep learning have proven useful solving difficult classification problems in multiple fields. That is why we propose a deep learning model that classifies different animal sounds using both Convolutional Neural Networks and Long Short-Term Memory Networks.

REVIEW THIS -> This model aims to reduce the invasiveness that requires doing taxonomies and blabla

We propose a hierarchical approach to this problem. Real life classification problems can be structured as a hierarchical tree. Usually, neural networks use a naive approach where all elements are at the same level and do not have structured information. For this problem, we try to use the structure information that the taxonomy of animals has, using as level the phylum, class, family and genus of each animal. This way we REVIEW obtain better results than with the naive approaches.

To be more precise, the approach we take is hierarchical local classifiers, which use one classifier per parent node in the hierarchical tree. We decide on this approach as using a global classifier generally returns worse results and/or are more complicated to design and train, as they need to be specially tailored for each problem.

For the training of the classifiers, and because the data we obtained is highly unbalanced, we feed all nodes the same information, changing the labels each one has to return, being the labels either a one-hot encoded if the element belongs to the branch of the tree, or zero otherwise.

Key points

  • Create a dataset from the Animal Dataset sounds using both MFCC and Mel Spectrograms to feed the model.
  • Classify among as many classes as possible
  • Try to fix the unbalance of the data using different data augmentation techniques
  • Compare results between a CNN approach against LSTM approach
  • Compare results between Flatten one-hot approach against Hierarchical approach
Clone this wiki locally