Rongrong Tao, Feng Chen, David Mares, Patrick Butler, Naren Ramakrishnan


he motives and means of explicit state censorship have been well studied, both quantitatively and qualitatively. Self-censorship by media outlets, however, has not received nearly as much attention, mostly because it is difficult to systematically detect. We develop a novel approach to identify news media self-censorship by using social media as a sensor. We develop a hypothesis testing framework to identify and evaluate censored clusters of keywords and a near-linear-time algorithm (called GraphDPD) to identify the highest scoring clusters as indicators of censorship. We evaluate the accuracy of our framework, versus other state-of-the-art algorithms, using both semi-synthetic and real-world data from Mexico and Venezuela during Year 2014. These tests demonstrate the capacity of our framework to identify self-censorship, and provide an indicator of broader media freedom. The results of this study lay the foundation for detection, study, and policy-response to self-censorship.

Rongrong Tao, Baojian Zhou, Feng Chen, David Mares, Patrick Butler, Naren Ramakrishnan, Ryan Kennedy:Detecting Media Self-Censorship without Explicit Training Data. SDM 2020: 550-558


Naren Ramakrishnan

Feng Chen

Patrick Butler

Publication Details

Date of publication:
SIAM International Conference on Data Mining
Page number(s):