Offensive Text Detection: Exploring Traditional Classifiers, Ensemble Models, and Kolmogorov Arnold Networks in Code-Mixed Tamil-English Text

EasyChair Preprint 15581

13 pages•Date: December 16, 2024

K Jaidev, Munnangi Pranish Kumar, Jampala Sai Chandana, Charishma Chowdary and Sachin Kumar

Abstract

Offensive content has become more common in the digital era due to the growth of social media and online communication, especially in languages like Tamil. The challenges of detecting such harmful content are due to the large-scale labeled information scarcity and the intricacy of code-switching. The hybrid architecture for offensive text identification described in this paper combines the most beneficial aspects of Kolmogorov-Arnold Networks (KAN), traditional machine learning classifiers, and ensemble models. Our strategy involves preprocessing of text, several extracted features, and tuning of hyperparameters for better performance of the model. We explore many different classifier performances comprising XGBoost, AdaBoost, Gradient Boosting, K-Nearest Neighbours (KNN), Random Forest, Support Vector Machine (SVM), and Logistic Regression. Extensive trials show that our hybrid system, particularly leveraging KAN, emerges as the best model for precisely identifying objectionable material in Tamil-English datasets with mixed coding. To address the challenges of offensive content identification in multilingual and code- mixed contexts, the results demonstrate the potential benefits of integrating conventional and cutting-edge machine learning techniques.

Keyphrases: Code-Mixed Text, Key Attention Networks, Kolmogorov-Arnold Networks, Linear SVC, Offensive Content, Offensive Language Detection, Offensive Text Detection, Tamil-English, accuracy precision recall, code mixed communication, code mixed tamil, code mixed tamil english data, hate speech detection, kolmogorov arnold networks kan, machine learning classifiers and ensemble models, neural networks, preprocessing and feature extraction methods, sentiment analysis and offensive language identification dataset, social media, traditional classifiers

Links:

https://easychair.org/publications/preprint/hJXt

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:15581,
  author    = {K Jaidev and Munnangi Pranish Kumar and Jampala Sai Chandana and Charishma Chowdary and Sachin Kumar},
  title     = {Offensive Text Detection: Exploring Traditional Classifiers, Ensemble Models, and Kolmogorov Arnold Networks in Code-Mixed Tamil-English Text},
  howpublished = {EasyChair Preprint 15581},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser