Naren Ramakrishnan, Ananth Y Grama

Abstract

The idea of unsupervised learning from basic facts (axioms) or from data has fascinated researchers for decades. Knowledge discovery engines try to extract general inferences from facts or training data. Statistical methods take a more structured approach, attempting to quantify data by known and intuitively understood models. The problem of gleaning knowledge from existing data sources poses a significant paradigm shift from these traditional approaches. The size, noise, diversity, dimensionality, and distributed nature of typical data sets make even formal problem specification difficult. Moreover, you typically do not have control over data generation. This lack of control opens up a Pandora's box filled with issues such as overfitting, limited coverage, and missing/incorrect data with high dimensionality. Once specified, solution techniques must deal with complexity, scalability (to meaningful data sizes), and presentation. This entire process is where data mining makes its transition from serendipity to science.

People

Naren Ramakrishnan


Publication Details

Date of publication:
August 5, 1999
Journal:
Computer
Page number(s):
34--37
Volume:
32
Issue Number:
8