A Framework for Exploiting Local Information to Enhance Density Estimation of Data Streams
Arnold Boediardjo, Chang-Tien Lu, Bingsheng Wang
Abstract
The Probability Density Function (PDF) is the fundamental data model for a variety of stream mining algorithms. Existing works apply the standard nonparametric Kernel Density Estimator (KDE) to approximate the PDF of data streams. As a result, the stream-based KDEs cannot accurately capture complex local density features. In this article, we propose the use of Local Region (LRs) to model local density information in univariate data streams. In-depth theoretical analyses are presented to justify the effectiveness of the LR-based KDE. Based on the analyses, we develop the General Local rEgion AlgorithM (GLEAM) to enhance the estimation quality of structurally complex univariate distributions for existing stream-based KDEs. A set of algorithmic optimizations is designed to improve the query throughput of GLEAM and to achieve its linear order computation. Additionally, a comprehensive suite of experiments was conducted to test the effectiveness and efficiency of GLEAM.
People
Publication Details
- Date of publication:
- Journal:
- ACM Transactions on Knowledge Discovery and Data Mining
- Volume:
- 9
- Issue Number:
- 1