Download PDFOpen PDF in browser

Multimodal Hate Speech Detection from Videos and Texts

EasyChair Preprint no. 10743

12 pagesDate: August 19, 2023


Since social media posts also consist of videos with associated comments, and many of these videos or their comments impart hate speech, detecting them in this multimodal setup is crucial. We have focused on the early detection of hate speech in videos by exploiting features from an initial set of comments. We devise Text Video Classifier (TVC), a multimodal hate classifier, based on four modalities which are character, words, sentence, and video frame features, respectively, and develop a Cross Attention Fusion Mechanism (CA-FM) to learn global feature embeddings from the inter-modal features. We report the architectural details and the experiments performed. We use several sampling techniques and train this architecture on a Vine dataset of both video and their comments. Our proposed architectural design attains performance improvement on the models previously constructed on the chosen dataset, for an output probability threshold of 0.5, showing the positive effect of using the CA-FM and TVC.

Keyphrases: Cross Attention Fusion Mechanism, multimodal, TVC

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Nishchal Prasad and Sriparna Saha and Pushpak Bhattacharyya},
  title = {Multimodal Hate Speech Detection from Videos and Texts},
  howpublished = {EasyChair Preprint no. 10743},

  year = {EasyChair, 2023}}
Download PDFOpen PDF in browser