Mohammed J Zaki, Naren Ramakrishnan
Re-description mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of re-description mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) re-descriptions underlying a data-set using notions of minimal generators of closed item sets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.
- Date of publication:
- August 21, 2005
- Page number(s):