Virginia Tech® home

Patterns amongst Competing Task Frequencies: Super-Linearities, and the Almond-DG Model

Danai Koutra, Vasileios Koutras, Christos Faloutsos

Abstract

If Alice has double the friends of Bob, will she also have double the phone-calls (or wall-postings, or tweets)? Our first contribution is the discovery that the relative frequencies obey a power-law (sub-linear, or super-linear), for a wide variety of diverse settings: tasks in a phone-call network, like count of friends, count of phone-calls, total count of minutes; tasks in a twitter-like network, like count of tweets, count of followees etc. The second contribution is that we further provide a full, digitized 2-d distribution, which we call the Almond-DG model, thanks to the shape of its iso-surfaces. The Almond-DG model matches all our empirical observations: super-linear relationships among variables, and (provably) log-logistic marginals. We illustrate our observations on two large, real network datasets, spanning ~2.2M and ~3.1M individuals with 5 features each. We show how to use our observations to spot clusters and outliers, like, e.g., telemarketers in our phone-call network.

Publication Details

Date of publication: December 31, 2012

Conference: Springer PAKDD

Page number(s): 201--212

Volume:

Issue Number: