Combining Heterogeneous Data Sources for Civil Unrest Forecasting
Gizem Korkmaz, Jose Cadena, Chris Kuhlman, Achla Marathe, Anil Vullikanti, Naren Ramakrishnan
Detecting and forecasting civil unrest events (protests, strikes, etc.) is of key interest to social scientists and policy makers because these events can lead to significant societal and cultural changes. We analyze protest dynamics in six countries of Latin America on a daily level, from November 2012 through August 2014, using multiple data sources that capture social, political and economic contexts within which civil unrest occurs. We use logistic regression models with Lasso to select a sparse feature set from our diverse datasets, in order to predict the probability of occurrence of civil unrest events in these countries. The models contain predictors extracted from social media sites (Twitter and blogs) and news sources, in addition to volume of requests to Tor, a widely-used anonymity network. Two political event databases and country-specific exchange rates are also used. Our forecasting models are evaluated using a Gold Standard Report (GSR), which is compiled by an independent group of social scientists and experts on Latin America. The experimental results, measured by F1-scores, are in the range 0.68 to 0.95, and demonstrate the efficacy of using a multi-source approach for predicting civil unrest. Case studies illustrate the insights into unrest events that are obtained with our methods.
- Date of publication:
- August 25, 2015
- IEEE/ACM Advances in Social Networks Analysis and Mining (ASONAM)