Multimodal recognition with deep learning: audio, image, and text

International Journal of Reconfigurable and Embedded Systems

Multimodal recognition with deep learning: audio, image, and text

Abstract

Emotion detection is essential in many domains including affective computing, psychological assessment, and human computer interaction (HCI). It contrasts the study of emotion detection across text, image, and speech modalities to evaluate state-of-the-art approaches in each area and identify their benefits and shortcomings. We looked at present methods, datasets, and evaluation criteria by conducting a comprehensive literature review. In order to conduct our study, we collect data, clean it up, identify its characteristics and then use deep learning (DL) models. In our experiments we performed text-based emotion identification using long short-term memory (LSTM), term frequency-inverse document frequency (TF-IDF) vectorizer, and image-based emotion recognition using a convolutional neural network (CNN) algorithm. Contributing to the body of knowledge in emotion recognition, our study's results provide light on the inner workings of different modalities. Experimental findings validate the efficacy of the proposed method while also highlighting areas for improvement.

Discover Our Library

Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.

Explore Now
Library 3D Ilustration