Download PDFOpen PDF in browser

Audio-Based Hate Speech Detection in Malayalam Using Machine Learning

EasyChair Preprint 15586

13 pagesDate: December 16, 2024

Abstract

Detecting hate speech on social media is challenging, particularly in low-resourced languages like Malayalam, due to the scarcity of annotated data. To address this challenge, we introduce a new multiclass dataset for hate speech in the Malayalam language, sourced from YouTube. The study benchmarks the performance of machine learning classifiers for the classification of hate and non-hate speech, in both binary and multi-class classification tasks, using audio features alone. The Random Forest Classifier model performed exceptionally well in binary classification, achieving a macro accuracy of 0.93 and an F1 score of 0.93. Ablation studies conducted with other classifiers, such as Logistic Regression, Support Vector Machines, and Naive Bayes, registered accuracies around 0.85 and macro F1 scores of 0.85. In multiclass classification, the Random Forest model excelled with an accuracy of 0.8289, a macro accuracy of 0.72, and an F1 score of 0.74, outperforming all other models tested in the ablation study. These results demonstrate the effectiveness of the Random Forest Classifier in contributing to a safer online environment by reliably detecting hate speech in Malayalam.

Keyphrases: Ablation Analysis, Cross-lingual language model, Machine Learning Classifiers, Macro Accuracy, Multiclass Classification, Multimodal Hate Speech Detection, Non hate speech, Random Forest Classifier Model, Speech and Language Technologies, Training and Testing, abusive language detection, audio based hate speech detection, audio feature extraction, bayes and random forest classifier classifiers, data sourced from youtube, detecting hate speech, feature extraction, hate and non hate speech, hate speech detection, language on social media, logistic regression, naive bayes and random forest classifier, random forest classifier logistic regression, sourced from youtube videos, youtube s hate speech

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15586,
  author    = {R V Gayathri Devi and J K Mahanivetha and P Seetharaman and K Devika and G Jyothish Lal},
  title     = {Audio-Based Hate Speech Detection in Malayalam Using Machine Learning},
  howpublished = {EasyChair Preprint 15586},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browser