M. Shahriar Hossain, Manish Marwah, Amip Shah, Layne T. Watson, Naren Ramakrishnan

Abstract

Exploratory data analysis aims to study datasets through the use of iterative, investigative, and visual analytic algorithms. Due to the difficulty in managing and accessing the growing volume of unstructured data, exploratory analysis of datasets has become harder than ever and an interest to data mining researchers. In this dissertation, we study new algorithms for exploratory analysis of data collections using clusters and stories. Clustering brings together similar entities whereas stories connect dissimilar objects. The former helps organize datasets into regions of interest, and the latter explores latent information by connecting the dots between disjoint instances. This dissertation specifically focuses on five different research aspects to demonstrate the applicability and usefulness of clusters and stories as exploratory data analysis tools. In the area of clustering, we
investigate whether clustering algorithms can be automatically “alternatized” and how they can be guided to obtain alternative results using flexible constraints as “scatter-gather” operations. We demonstrate the application of these ideas in many application domains, including studying the bat biosonar system and designing sustainable products. In the area of storytelling, we develop algorithms
that can generate stories using distance, clique, and syntactic constraints. We explore the use of storytelling for studying document collections in the biomedical literature and intelligence analysis domain.

People

Naren Ramakrishnan


Layne T. Watson


Publication Details

Date of publication:
April 1, 2014
Journal:
ACM Transactions on Intelligent Systems and Technology (TIST)
Publisher:
Association for Computing Machinery (ACM)
Page number(s):
1--21
Volume:
5
Issue Number:
2