Patterns amongst Competing Task Frequencies: Super-Linearities, and the Almond-DG Model
Danai Koutra, Vasileios Koutras, Christos Faloutsos
Abstract
If Alice has double the friends of Bob, will she also have double the phone-calls (or wall-postings, or tweets)? Our first contribution is the discovery that the relative frequencies obey a power-law (sub-linear, or super-linear), for a wide variety of diverse settings: tasks in a phone-call network, like count of friends, count of phone-calls, total count of minutes; tasks in a twitter-like network, like count of tweets, count of followees etc. The second contribution is that we further provide a full, digitized 2-d distribution, which we call the Almond-DG model, thanks to the shape of its iso-surfaces. The Almond-DG model matches all our empirical observations: super-linear relationships among variables, and (provably) log-logistic marginals. We illustrate our observations on two large, real network datasets, spanning ~2.2M and ~3.1M individuals with 5 features each. We show how to use our observations to spot clusters and outliers, like, e.g., telemarketers in our phone-call network.
Publication Details
- Date of publication:
- January 1, 2013
- Conference:
- PAKDD
- Publisher:
- Springer Science + Business Media
- Page number(s):
- 201--212