Detecting Hate Speech In Multimodal Memes Using Vision-Language Models

Published in https://pub.uni-bielefeld.de/, 2021

Abstract— Memes on the Internet are often harmless and sometimes amusing. The apparently innocent meme, though, becomes a multimodal form of hate speech when certain kinds of pictures, text, or variations of both are used – a hateful meme. The Hateful Memes Challenge is a one-of-a-kind competition that focuses on detecting hate speech in multimodal memes and proposes a new data collection with 10,000+ new examples of multimodal content. We use VisualBERT, which is also known as “BERT for vision and language,” and Ensemble Learning to boost the performance. In the Hateful Memes Challenge, our solution received an AUROC of 0.811 and an accuracy of 0.765 on the challenge test set, placing us third out of 3,173 participants. The code is available on GitHub.

Download the thesis here!

Cited as:

@mastersthesis{velioglu2021detecting,
  title   = "Detecting Hate Speech In Multimodal Memes Using Vision-Language Models",
  author  = "Velioglu, Riza",
  school  = "Bielefeld University",
  year    = "2021",
  url     = "https://pub.uni-bielefeld.de/record/2989295"
}