Lei Zhang

Abstract

The COVID-19 pandemic has caused hate speech on online social networks to become a growing issue in recent years, affecting millions. Our work aims to improve automatic hate speech detection to prevent escalation to hate crimes. The first c hallenge i n h ate s peech r esearch i s t hat e xisting datasets suffer from quite severe class imbalances. The second challenge is the sparsity of information in textual data. The third challenge is the difficulty i n b alancing t he t radeoff b etween utilizing semantic similarity and noisy network language. To combat these challenges, we establish a framework for automatic short text data augmentation by using a semi-supervised hybrid of Substitution Based Augmentation and Dynamic Query Expansion (DQE), which we refer to as SubDQE, to extract more data points from a specific c lass f rom T witter. W e a lso p ropose the HateNet model, which has two main components, a Graph Convolutional Network and a Weighted Drop-Edge. First, we propose a Graph Convolutional Network (GCN) classifier, using a graph constructed from the thresholded cosine similarities between tweet embeddings to provide new insights into how ideas are connected. Second, we propose a weighted

Drop-Edge based stochastic regularization technique, which removes edges randomly based on weighted probabilities assigned by the semantic similarities between Tweets. Using 3 different SubDQE-augmented datasets, we compare our HateNet model using eight different tweet embedding methods, six other baseline classification models, and seven other baseline data augmentation techniques previously used in the realm of hate speech detection. Our results show that our proposed HateNet model matches or exceeds the performance of the baseline models, as indicated by the accuracy and F1 score.

Charles Duong, Lei Zhang, Chang-Tien Lu: HateNet: A Graph Convolutional Network Approach to Hate Speech Detection. IEEE Big Data 2022: 5698-5707

People

Lei Zhang


Publication Details

Date of publication:
January 26, 2023
Conference:
Big Data
Page number(s):
5698-5707