Naren Ramakrishnan, Deept Kumar, Bud Mishra, Malcolm Potts, Richard F Helm

Abstract

We present an unusual algorithm involving classification trees---CARTwheels---where two trees are grown in opposite directions so that they are joined at their leaves. This approach finds application in a new data mining task we formulate, called re-description mining. A re-description is a shift-of-vocabulary, or a different way of communicating information about a given subset of data; the goal of re-description mining is to find subsets of data that afford multiple descriptions. We highlight the importance of this problem in domains such as bioinformatics, which exhibit an underlying richness and diversity of data descriptors (e.g., genes can be studied in a variety of ways). CARTwheels exploits the duality between class partitions and path partitions in an induced classification tree to model and mine re-descriptions. It helps integrate multiple forms of characterizing data-sets, situates the knowledge gained from one data-set in the context of others, and harnesses high-level abstractions for uncovering cryptic and subtle features of data. Algorithm design decisions, implementation details, and experimental results are presented.

People

Naren Ramakrishnan


Publication Details

Date of publication:
August 22, 2004
Conference:
SIGKDD international conference on Knowledge discovery and data mining
Page number(s):
266--275