Evangelos E. Papalexakis, Tudor Dumitras, Duen Horng Chau, Christos Faloutsos
How does malware propagate? Does it form spikes over time? Does it resemble the propagation pattern of benign files, such as software patches? Does it spread uniformly over countries? How long does it take for a URL that distributes malware to be detected and shut down? In this work, we answer these questions by analyzing patterns from 22 million malicious (and benign) files, found on 1.6 million hosts worldwide during the month of June 2011. We conduct this study using the WINE database available at Symantec Research Labs. Additionally, we explore the research questions raised by sampling on such large databases of executables; the importance of studying the implications of sampling is twofold: First, sampling is a means of reducing the size of the database hence making it more accessible to researchers; second, because every such data collection can be perceived as a sample of the real world. We discover the SharkFin temporal propagation pattern of executable files, the GeoSplit pattern in the geographical spread of machines that report executables to Symantec’s servers, the Periodic Power Law (Ppl) distribution of the lifetime of URLs, and we show how to efficiently extrapolate crucial properties of the data from a small sample. We further investigate the propagation pattern of benign and malicious executables, unveiling latent structures in the way these files spread. To the best of our knowledge, our work represents the largest study of propagation patterns of executables.
- Date of publication:
- November 27, 2014
- Social Network Analysis and Mining
- Springer Science + Business Media
- Issue Number: