Multilingual hate speech detection using deep learning

International Journal of Informatics and Communication Technology

Multilingual hate speech detection using deep learning

Abstract

The rise of social media has enabled public expression but also fueled the spread of hate speech, contributing to social tensions and potential violence. Natural language processing (NLP), particularly text classification, has become essential for detecting hate speech. This study develops a hate speech detection model on Twitter using FastText with bidirectional long short-term memory (Bi-LSTM) and explores multilingual bidirectional encoder representations from transformers (M-BERT) for handling diverse languages. Data augmentation techniques-including easy data augmentation (EDA) methods, back translation, and generative adversarial networks (GANs)-are employed to enhance classification, especially for imbalanced datasets. Results show that data augmentation significantly boosts performance. The highest F1-scores are achieved by random insertion for Indonesian (F1-score: 0.889, Accuracy: 0.879), synonym replacement for English (F1-score: 0.872, Accuracy: 0.831), and random deletion for German (F1-score: 0.853, Accuracy: 0.830) with the FastText + Bi-LSTM model. The M-BERT model performs best with random deletion for Indonesian (F1-score: 0.898, Accuracy: 0.880), random swap for English (F1 score: 0.870, Accuracy: 0.866), and random deletion for German (F1-score: 0.662, Accuracy: 0.858). These findings underscore that data augmentation effectiveness varies by language and model. This research supports efforts to mitigate hate speech’s impact on social media by advancing multilingual detection capabilities.

Discover Our Library

Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.

Explore Now
Library 3D Ilustration