Prithwish Chakraborty, Pejman Khadivi, Bryan Lewis, Aravindan Mahendiran, Jiangzhuo Chen, Patrick Butler, Elaine O. Nsoesie, Sumiko R. Mekaru, John S. Brownstein, Madhav Marathe, Naren Ramakrishnan
Modern epidemiological forecasts of common illnesses, such as the flu, rely on both traditional surveillance sources as well as digital surveillance data. However, most published studies have been retrospective. Concurrently, the reports about flu activity generally lags by several weeks and even when published are revised for several weeks more. We posit that effectively handling this uncertainty is one of the key challenges for a real-time prediction system in this sphere. In this paper, we present a detailed prospective analysis on the generation of robust quantitative predictions about temporal trends of flu activity, using several surrogate data sources for 15 Latin American countries. We present our findings about the limitations and possible advantages of correcting the uncertainty associated with official flu estimates. We also compare the prediction accuracy between model-level fusion of different surrogate data sources against data-level fusion. Finally, we present a novel matrix factorization approach using neighborhood embedding to predict flu case counts. Comparing our proposed ensemble method against several baseline methods helps us demarcate the importance of different data sources for the countries under consideration.
- Date of publication:
- October 13, 2014
- SIAM International Conference on Data Mining