Detecting Media Self-Censorship without Explicit Training Data – Sanghani Center for Artificial Intelligence and Data Analytics

Rongrong Tao, Feng Chen, David Mares, Patrick Butler, Naren Ramakrishnan

Abstract

he motives and means of explicit state censorship have been well studied, both quantitatively and qualitatively. Self-censorship by media outlets, however, has not received nearly as much attention, mostly because it is difficult to systematically detect. We develop a novel approach to identify news media self-censorship by using social media as a sensor. We develop a hypothesis testing framework to identify and evaluate censored clusters of keywords and a near-linear-time algorithm (called GraphDPD) to identify the highest scoring clusters as indicators of censorship. We evaluate the accuracy of our framework, versus other state-of-the-art algorithms, using both semi-synthetic and real-world data from Mexico and Venezuela during Year 2014. These tests demonstrate the capacity of our framework to identify self-censorship, and provide an indicator of broader media freedom. The results of this study lay the foundation for detection, study, and policy-response to self-censorship.

Rongrong Tao, Baojian Zhou, Feng Chen, David Mares, Patrick Butler, Naren Ramakrishnan, Ryan Kennedy:Detecting Media Self-Censorship without Explicit Training Data. SDM 2020: 550-558

People

Patrick Butler

Senior Research Associate

Naren Ramakrishnan

Professor of Computer Science
Director

Feng Chen

Alumni

Rongrong Tao

Alumni

Publication Details

Date of publication:
Conference:: SIAM International Conference on Data Mining
Page number(s):: 550-558