Towards insight-driven sampling for big data visualisation – Sanghani Center for Artificial Intelligence and Data Analytics

Moeti Masiane, John Wenskovitch, Chris North

Abstract

Creating an interactive, accurate, and low-latency big data visualisation is challenging due to the volume, variety, and velocity of the data. Visualisation options range from visualising the entire big dataset, which could take a long time and be taxing to the system, to visualising a small subset of the dataset, which could be fast and less taxing to the system but could also lead to a less-beneficial visualisation as a result of information loss. The main research questions investigated by this work are what effect sampling has on visualisation insight and how to provide guidance to users in navigating this trade-off. To investigate these issues, we study an initial case of simple estimation tasks on histogram visualisations of sampled big data, in hopes that these results may generalise. Leveraging sampling, we generate subsets of large datasets and create visualisations for a crowd-sourced study involving a simple cognitive visualisation task. Using the results of this study, we quantify insight, sampling, visualisation, and perception error in comparison to the full dataset. We use these results to model the relationship between sample size and insight error, and we propose the use of our model to guide big data visualisation sampling.

People

Chris North

Professor of Computer Science
Associate Director

John Wenskovitch

Adjunct Professor

Moeti Masiane

Alumni

Publication Details

Date of publication:: May 16, 2019
Journal:: Behaviour & Information Technology
Page number(s):: 788-807
Volume:: 39
Issue Number:: 7
Publication note:: Moeti Masiane, Anne Driscoll, Wu-chun Feng, John E. Wenskovitch, Chris North: Towards insight-driven sampling for big data visualisation. Behav. Inf. Technol. 39(7): 788-807 (2020)