Empowering low-resource languages: a machine learning approach to Tamil sentiment classification

International Journal of Informatics and Communication Technology

Empowering low-resource languages: a machine learning approach to Tamil sentiment classification

Abstract

Sentiment analysis is essential for deciphering public opinion, guiding decisions, and refining marketing strategies. It plays a crucial role in monitoring public sentiment, fostering customer engagement, and enhancing relationships with businesses' target audiences by analyzing emotional tones and attitudes in vast textual data. Sentiment analysis is extremely limited, particularly for languages like Tamil, due to limited application in diverse linguistic contexts with fewer resources. Given its global impact and linguistic diversity, addressing this gap is crucial for a more nuanced understanding of sentiments in India. In the context of Tamil, the need for sentiment analysis models is particularly crucial due to its status as one of the classical languages spoken by millions. The cultural, social, and historical nuances embedded in Tamil language usage require tailored sentiment analysis approaches that can capture the subtleties of sentiment expression. This paper introduces a novel method that assesses the performance of various text embedding methods in conjunction with a range of machine learning (ML) algorithms to enhance sentiment classification for Tamil text, with a specific focus on lyrics. Experiments notably emphasize FastText word embedding as the most effective method, showcasing superior results with a remarkable 78% accuracy when coupled with the support vector classification (SVC) model.

Discover Our Library

Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.

Explore Now
Library 3D Ilustration