Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. In this article, we present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class. Experiments were performed on technical domains to validate the utility of the connections between interests and experts. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach.
- Date of publication:
- December 1, 2021
- IEEE Transactions on Big Data
- Page number(s):
- Issue Number:
- Publication note:
Manu Shukla, Dinesh Dharme, Pallavi Ramnarain, Ray Dos Santos, Chang-Tien Lu: DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce. IEEE Trans. Big Data 7(6): 930-951 (2021)