Enhanced sentiment analysis based on improved word embeddings and XGboost
Institute of Advanced Engineering and Science
Amina Samih, Abderrahim Ghadi, Abdelhadi Fennan,
International Journal of Electrical and Computer Engineering (IJECE), Vol 13, No 2: April 2023 , pp. 1827-1836
Abstract
Sentiment analysis is a well-known and rapidly expanding study topic in natural language processing (NLP) and text classification. This approach has evolved into a critical component of many applications, including politics, business, advertising, and marketing. Most current research focuses on obtaining sentiment features through lexical and syntactic analysis. Word embeddings explicitly express these characteristics. This article proposes a novel method, improved words vector for sentiments analysis (IWVS), using XGboost to improve the F1-score of sentiment classification. The proposed method constructed sentiment vectors by averaging the word embeddings (Sentiment2Vec). We also investigated the Polarized lexicon for classifying positive and negative sentiments. The sentiment vectors formed a feature space to which the examined sentiment text was mapped to. Those features were input into the chosen classifier (XGboost). We compared the F1-score of sentiment classification using our method via different machine learning models and sentiment datasets. We compare the quality of our proposition to that of baseline models, term frequency-inverse document frequency (TF-IDF) and Doc2vec, and the results show that IWVS performs better on the F1-measure for sentiment classification. At the same time, XGBoost with IWVS features was the best model in our evaluation.
machine learning; sentiment analysis; sentiment2Vec; Word2Vec; XGboost;
Publisher: Institute of Advanced Engineering and Science
Publish Date: 2023-04-01
DOI: 10.11591/ijece.v13i2.pp1827-1836Publish Year: 2023