Zhiqian Chen, Weisheng Zhong, Arnold Boediardjo, Chang-Tien Lu
Connecting the dots between diverse entities such as people and organizations is a vital task for forming hypotheses and uncovering latent relationships among complex and large datasets. Most existing approaches are designed to address the relationship of entities in news reports, documents and abstracts, but such approaches are not suitable for Twitter data streams due to their unstructured languages, short-length messages, heterogeneous features and massive size. The sheer size of Twitter data requires more efficient algorithms to connect the dots within a short period of time. We present a system that automatically constructs stories by connecting entities in Twitter datasets. An entity similarity model is designed that combines both traditional entity-related features and social network attributes and a novel story generation algorithm applied on the similarity model is proposed to cope with the massive Twitter datasets. Extensive experimental evaluations were conducted to demonstrate the effectiveness of this new approach.
- Date of publication:
- December 5, 2016
- IEEE International Conference on Big Data