x4nth055
diff --git a/‎README.md
Lines changed: 46 additions & 2 deletions b/‎README.md
Lines changed: 46 additions & 2 deletions
diff --git a/‎cog.yaml
Lines changed: 18 additions & 0 deletions b/‎cog.yaml
Lines changed: 18 additions & 0 deletions
diff --git a/‎convert_wavs.py
Lines changed: 2 additions & 1 deletion b/‎convert_wavs.py
Lines changed: 2 additions & 1 deletion
diff --git a/‎create_csv.py
Lines changed: 8 additions & 6 deletions b/‎create_csv.py
Lines changed: 8 additions & 6 deletions
diff --git a/‎deep_emotion_recognition.py
Lines changed: 15 additions & 27 deletions b/‎deep_emotion_recognition.py
Lines changed: 15 additions & 27 deletions
diff --git a/‎emotion_recognition.py
Lines changed: 18 additions & 17 deletions b/‎emotion_recognition.py
Lines changed: 18 additions & 17 deletions
diff --git a/‎features/test_mfcc-chroma-mel_AHNPS_741.npy
1.02 MB b/‎features/test_mfcc-chroma-mel_AHNPS_741.npy
1.02 MB
diff --git a/‎features/test_mfcc-chroma-mel_AHNPS_800.npy
-1.1 MB b/‎features/test_mfcc-chroma-mel_AHNPS_800.npy
-1.1 MB
diff --git a/‎features/test_mfcc-chroma-mel_HNS_490.npy
689 KB b/‎features/test_mfcc-chroma-mel_HNS_490.npy
689 KB
diff --git a/‎features/test_mfcc-chroma-mel_HNS_548.npy
-771 KB b/‎features/test_mfcc-chroma-mel_HNS_548.npy
-771 KB
@@ -1,12 +1,16 @@
 # Speech Emotion Recognition
 ## Introduction
+<a href="https://replicate.ai/x4nth055/emotion-recognition-using-speech"><img src="https://img.shields.io/static/v1?label=Replicate&message=Demo and Docker Image&color=darkgreen" height=20></a>
+
+
 - This repository handles building and training Speech Emotion Recognition System.
 - The basic idea behind this tool is to build and train/test a suited machine learning ( as well as deep learning ) algorithm that could recognize and detects human emotions from speech.
 - This is useful for many industry fields such as making product recommendations, affective computing, etc.
 - Check this [tutorial](https://www.thepythoncode.com/article/building-a-speech-emotion-recognizer-using-sklearn) for more information.
 ## Requirements
 - **Python 3.6+**
 ### Python Packages
+- **tensorflow**
 - **librosa==0.6.3**
 - **numpy**
 - **pandas**
@@ -38,7 +42,7 @@ Feature extraction is the main part of the speech emotion recognition system. It
 
 In this repository, we have used the most used features that are available in [librosa](https://github.com/librosa/librosa) library including:
 - [MFCC](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)
-- Chromagram 
+- Chromagram
 - MEL Spectrogram Frequency (mel)
 - Contrast
 - Tonnetz (tonal centroid features)
@@ -102,6 +106,7 @@ print("Prediction:", rec.predict("data/tess_ravdess/validation/Actor_25/25_01_01
 Prediction: neutral
 Prediction: sad
 ```
+You can pass any audio file, if it's not in the appropriate format (16000Hz and mono channel), then it'll be automatically converted, make sure you have `ffmpeg` installed in your system and added to *PATH*.
 ## Example 2: Using RNNs for 5 Emotions
 ```python
 from deep_emotion_recognition import DeepEmotionRecognizer
@@ -143,6 +148,45 @@ true_neutral         3.846154       8.974360          82.051285      2.564103
 true_ps              2.564103       0.000000           1.282051     83.333328        12.820514
 true_happy          20.512821       2.564103           2.564103      2.564103        71.794876
 ```
+## Example 3: Not Passing any Model and Removing the Custom Dataset
+Below code initializes `EmotionRecognizer` with 3 chosen emotions while removing Custom dataset, and setting `balance` to `False`:
+```python
+from emotion_recognition import EmotionRecognizer
+# initialize instance, this will take a bit the first time executed
+# as it'll extract the features and calls determine_best_model() automatically
+# to load the best performing model on the picked dataset
+rec = EmotionRecognizer(emotions=["angry", "neutral", "sad"], balance=False, verbose=1, custom_db=False)
+# it will be trained, so no need to train this time
+# get the accuracy on the test set
+print(rec.confusion_matrix())
+# predict angry audio sample
+prediction = rec.predict('data/validation/Actor_10/03-02-05-02-02-02-10_angry.wav')
+print(f"Prediction: {prediction}")
+```
+**Output:**
+```
+[+] Best model determined: RandomForestClassifier with 93.454% test accuracy
+
+              predicted_angry  predicted_neutral  predicted_sad
+true_angry          98.275864           1.149425       0.574713
+true_neutral         0.917431          88.073395      11.009174
+true_sad             6.250000           1.875000      91.875000
+
+Prediction: angry
+```
+You can print the number of samples on each class:
+```python
+rec.get_samples_by_class()
+```
+**Output:**
+```
+         train  test  total
+angry      910   174   1084
+neutral    650   109    759
+sad        862   160   1022
+total     2422   443   2865
+```
+In this case, the dataset is only from TESS and RAVDESS, and not balanced, you can pass `True` to `balance` on the `EmotionRecognizer` instance to balance the data.
 ## Algorithms Used
 This repository can be used to build machine learning classifiers as well as regressors for the case of 3 emotions {'sad': 0, 'neutral': 1, 'happy': 2} and the case of 5 emotions {'angry': 1, 'sad': 2, 'neutral': 3, 'ps': 4, 'happy': 5}
 ### Classifiers
@@ -207,4 +251,4 @@ plot_histograms(classifiers=True)
 **Output:**
 
 <img src="images/Figure.png">
-<p align="center">A Histogram shows different algorithms metric results on different data sizes as well as time consumed to train/predict.</p>
+<p align="center">A Histogram shows different algorithms metric results on different data sizes as well as time consumed to train/predict.</p>
@@ -0,0 +1,18 @@
+build:
+  python_version: "3.6"
+  gpu: false
+  python_packages:
+    - pandas==1.1.5
+    - numpy==1.17.3
+    - wave==0.0.2
+    - sklearn==0.0
+    - librosa==0.6.3
+    - soundfile==0.9.0
+    - tqdm==4.28.1
+    - matplotlib==2.2.3
+    - pyaudio==0.2.11
+    - numba==0.48
+  system_packages:
+    - "ffmpeg"
+    - "portaudio19-dev"
+predict: "predict.py:EmoPredictor"
@@ -17,10 +17,11 @@ def convert_audio(audio_path, target_path, remove=False):
                 remove (bool): whether to remove the old file after converting
         Note that this function requires ffmpeg installed in your system."""
 
-    os.system(f"ffmpeg -i {audio_path} -ac 1 -ar 16000 {target_path}")
+    v = os.system(f"ffmpeg -i {audio_path} -ac 1 -ar 16000 {target_path}")
     # os.system(f"ffmpeg -i {audio_path} -ac 1 {target_path}")
     if remove:
         os.remove(audio_path)
+    return v
 
 
 def convert_audios(path, target_path, remove=False):
 
@@ -69,18 +69,20 @@ def write_tess_ravdess_csv(emotions=["sad", "neutral", "happy"], train_name="tra
 
     for category in emotions:
         # for training speech directory
-        for i, path in enumerate(glob.glob(f"data/training/Actor_*/*_{category}.wav")):
+        total_files = glob.glob(f"data/training/Actor_*/*_{category}.wav")
+        for i, path in enumerate(total_files):
             train_target["path"].append(path)
             train_target["emotion"].append(category)
-        if verbose:
-            print(f"[TESS&RAVDESS] There are {i} training audio files for category:{category}")
+        if verbose and total_files:
+            print(f"[TESS&RAVDESS] There are {len(total_files)} training audio files for category:{category}")
 
         # for validation speech directory
-        for i, path in enumerate(glob.glob(f"data/validation/Actor_*/*_{category}.wav")):
+        total_files = glob.glob(f"data/validation/Actor_*/*_{category}.wav")
+        for i, path in enumerate(total_files):
             test_target["path"].append(path)
             test_target["emotion"].append(category)
-        if verbose:
-            print(f"[TESS&RAVDESS] There are {i} testing audio files for category:{category}")
+        if verbose and total_files:
+            print(f"[TESS&RAVDESS] There are {len(total_files)} testing audio files for category:{category}")
     pd.DataFrame(test_target).to_csv(test_name)
     pd.DataFrame(train_target).to_csv(train_name)
 
 
@@ -3,26 +3,13 @@
 import sys
 stderr = sys.stderr
 sys.stderr = open(os.devnull, 'w')
-import keras
-sys.stderr = stderr
-# to use CPU uncomment below code
-os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   # see issue #152
-os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
-# disable tensorflow logs
-os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
 import tensorflow as tf
 
-config = tf.ConfigProto(intra_op_parallelism_threads=5,
-                        inter_op_parallelism_threads=5, 
-                        allow_soft_placement=True,
-                        device_count = {'CPU' : 1,
-                                        'GPU' : 0}
-                       )
-from keras.layers import LSTM, GRU, Dense, Activation, LeakyReLU, Dropout
-from keras.layers import Conv1D, MaxPool1D, GlobalAveragePooling1D
-from keras.models import Sequential
-from keras.callbacks import ModelCheckpoint, TensorBoard
-from keras.utils import to_categorical
+from tensorflow.keras.layers import LSTM, GRU, Dense, Activation, LeakyReLU, Dropout
+from tensorflow.keras.layers import Conv1D, MaxPool1D, GlobalAveragePooling1D
+from tensorflow.keras.models import Sequential
+from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
+from tensorflow.keras.utils import to_categorical
 
 from sklearn.metrics import accuracy_score, mean_absolute_error, confusion_matrix
 
@@ -82,7 +69,7 @@ def __init__(self, **kwargs):
                 regression.
         """
         # init EmotionRecognizer
-        super().__init__(None, **kwargs)
+        super().__init__(**kwargs)
 
         self.n_rnn_layers = kwargs.get("n_rnn_layers", 2)
         self.n_dense_layers = kwargs.get("n_dense_layers", 2)
@@ -103,7 +90,7 @@ def __init__(self, **kwargs):
 
         # training attributes
         self.batch_size = kwargs.get("batch_size", 64)
-        self.epochs = kwargs.get("epochs", 1000)
+        self.epochs = kwargs.get("epochs", 500)
 
         # the name of the model
         self.model_name = ""
@@ -264,7 +251,7 @@ def train(self, override=False):
         model_filename = self._get_model_filename()
 
         self.checkpointer = ModelCheckpoint(model_filename, save_best_only=True, verbose=1)
-        self.tensorboard = TensorBoard(log_dir=f"logs/{self.model_name}")
+        self.tensorboard = TensorBoard(log_dir=os.path.join("logs", self.model_name))
 
         self.history = self.model.fit(self.X_train, self.y_train,
                         batch_size=self.batch_size,
@@ -335,8 +322,8 @@ def confusion_matrix(self, percentage=True, labeled=True):
                                     columns=[ f"predicted_{e}" for e in self.emotions ])
         return matrix
 
-    def n_emotions(self, emotion, partition):
-        """Returns number of `emotion` data samples in a particular `partition`
+    def get_n_samples(self, emotion, partition):
+        """Returns number data samples of the `emotion` class in a particular `partition`
         ('test' or 'train')
         """
         if partition == "test":
@@ -361,8 +348,8 @@ def get_samples_by_class(self):
         test_samples = []
         total = []
         for emotion in self.emotions:
-            n_train = self.n_emotions(self.emotions2int[emotion]+1, "train")
-            n_test = self.n_emotions(self.emotions2int[emotion]+1, "test")
+            n_train = self.get_n_samples(self.emotions2int[emotion]+1, "train")
+            n_test = self.get_n_samples(self.emotions2int[emotion]+1, "test")
             train_samples.append(n_train)
             test_samples.append(n_test)
             total.append(n_train + n_test)
@@ -396,9 +383,10 @@ def get_random_emotion(self, emotion, partition="train"):
 
         return index
 
-    def determine_best_model(self, train=True):
+    def determine_best_model(self):
         # TODO
-        raise TypeError("This method isn't supported yet for deep nn")
+        # raise TypeError("This method isn't supported yet for deep nn")
+        pass
 
 
 if __name__ == "__main__":
 
@@ -19,10 +19,11 @@
 class EmotionRecognizer:
     """A class for training, testing and predicting emotions based on
     speech's features that are extracted and fed into `sklearn` or `keras` model"""
-    def __init__(self, model, **kwargs):
+    def __init__(self, model=None, **kwargs):
         """
         Params:
-            model (sklearn model): the model used to detect emotions.
+            model (sklearn model): the model used to detect emotions. If `model` is None, then self.determine_best_model()
+                will be automatically called
             emotions (list): list of emotions to be used. Note that these emotions must be available in
                 RAVDESS_TESS & EMODB Datasets, available nine emotions are the following:
                     'neutral', 'calm', 'happy', 'sad', 'angry', 'fear', 'disgust', 'ps' ( pleasant surprised ), 'boredom'.
@@ -42,8 +43,6 @@ def __init__(self, model, **kwargs):
         Note that when `tess_ravdess`, `emodb` and `custom_db` are set to `False`, `tess_ravdess` will be set to True
         automatically.
         """
-        # model
-        self.model = model
         # emotions
         self.emotions = kwargs.get("emotions", ["sad", "neutral", "happy"])
         # make sure that there are only available emotions
@@ -79,6 +78,12 @@ def __init__(self, model, **kwargs):
         self.data_loaded = False
         self.model_trained = False
 
+        # model
+        if not model:
+            self.determine_best_model()
+        else:
+            self.model = model
+
     def _set_metadata_filenames(self):
         """
         Protected method to get all CSV (metadata) filenames into two instance attributes:
@@ -182,7 +187,7 @@ def predict_proba(self, audio_path):
             feature = extract_feature(audio_path, **self.audio_config).reshape(1, -1)
             proba = self.model.predict_proba(feature)[0]
             result = {}
-            for emotion, prob in zip(self.emotions, proba):
+            for emotion, prob in zip(self.model.classes_, proba):
                 result[emotion] = prob
             return result
         else:
@@ -199,12 +204,10 @@ def grid_search(self, params, n_jobs=2, verbose=1):
         grid_result = grid.fit(self.X_train, self.y_train)
         return grid_result.best_estimator_, grid_result.best_params_, grid_result.best_score_
 
-    def determine_best_model(self, train=True):
+    def determine_best_model(self):
         """
         Loads best estimators and determine which is best for test data,
         and then set it to `self.model`.
-        if `train` is True, then train that model on train data, so the model
-        will be ready for inference.
         In case of regression, the metric used is MSE and accuracy for classification.
         Note that the execution of this method may take several minutes due
         to training all estimators (stored in `grid` folder) for determining the best possible one.
@@ -240,11 +243,9 @@ def determine_best_model(self, train=True):
             result.append((detector.model, accuracy))
 
         # sort the result
-        if self.classification:
-            result = sorted(result, key=lambda item: item[1], reverse=True)
-        else:
-            # regression, best is the lower, not the higher
-            result = sorted(result, key=lambda item: item[1], reverse=False)
+        # regression: best is the lower, not the higher
+        # classification: best is higher, not the lower
+        result = sorted(result, key=lambda item: item[1], reverse=self.classification)
         best_estimator = result[0][0]
         accuracy = result[0][1]
         self.model = best_estimator
@@ -316,8 +317,8 @@ def draw_confusion_matrix(self):
         pl.imshow(matrix, cmap="binary")
         pl.show()
 
-    def n_emotions(self, emotion, partition):
-        """Returns number of `emotion` data samples in a particular `partition`
+    def get_n_samples(self, emotion, partition):
+        """Returns number data samples of the `emotion` class in a particular `partition`
         ('test' or 'train')
         """
         if partition == "test":
@@ -337,8 +338,8 @@ def get_samples_by_class(self):
         test_samples = []
         total = []
         for emotion in self.emotions:
-            n_train = self.n_emotions(emotion, "train")
-            n_test = self.n_emotions(emotion, "test")
+            n_train = self.get_n_samples(emotion, "train")
+            n_test = self.get_n_samples(emotion, "test")
             train_samples.append(n_train)
             test_samples.append(n_test)
             total.append(n_train + n_test)