Let us assume that we have a dataset for training purposes with X features and y being the output. Let us assume we have a data with people walking and individual series of images have been used for training the model. For example we have a woman say "Alex". We record her features(Her speed and walk style) and state that the output is "Alex". RCNN is used for the gat analysis. Since images are being used, CNN should be used. Since this is a time series problem RCNN was preferred.
This project combines facial and textual recognition models of emotion recognition to analyze a text and convert it into to its emotion recognised speech. Generally, a tone of the text we write is lagging emotions, I have tried to combine facial expressions with the typed text to understand a particular emotion that is depicted in the text and voice out the text using the recognised emotion. My focus is on real-time prediction of a person’s emotion while typing with a good accuracy. This is a mini project that I feel could be used in the real worl to help make texting a lot more emotion centred.