Moeti Masiane, John Wenskovitch, Chris North


Creating an interactive, accurate, and low-latency big data visualisation is challenging due to the volume, variety, and velocity of the data. Visualisation options range from visualising the entire big dataset, which could take a long time and be taxing to the system, to visualising a small subset of the dataset, which could be fast and less taxing to the system but could also lead to a less-beneficial visualisation as a result of information loss. The main research questions investigated by this work are what effect sampling has on visualisation insight and how to provide guidance to users in navigating this trade-off. To investigate these issues, we study an initial case of simple estimation tasks on histogram visualisations of sampled big data, in hopes that these results may generalise. Leveraging sampling, we generate subsets of large datasets and create visualisations for a crowd-sourced study involving a simple cognitive visualisation task. Using the results of this study, we quantify insight, sampling, visualisation, and perception error in comparison to the full dataset. We use these results to model the relationship between sample size and insight error, and we propose the use of our model to guide big data visualisation sampling.


John Wenskovitch

Chris North

Publication Details

Date of publication:
May 16, 2019
Behaviour & Information Technology
Page number(s):
Issue Number:
Publication note:

Moeti Masiane, Anne Driscoll, Wu-chun Feng, John E. Wenskovitch, Chris North: Towards insight-driven sampling for big data visualisation. Behav. Inf. Technol. 39(7): 788-807 (2020)