Xavier Pleimling, Vedant Shah, Ismini Lourentzou
As large-scale machine learning models become more prevalent in assistive and pervasive technologies, the research community has started examining limitations and challenges that arise from training data, e.g., fairness, bias, and interpretability issues. To this end, data-centric approaches are increasingly prevailing over time, showing that high-quality data is a critical component in many applications. Several studies explore methods to define and improve data quality, however, no uniform definition exists. In this work, we present an empirical analysis of the multifaceted problem of evaluating data quality. Our work aims at identifying data quality challenges that are most commonly observed by data users and practitioners. Inspired by the need for generally applicable methods, we select a representative set of quality indicators, that covers a broad spectrum of issues, and investigate the utility of these indicators on a broad range of datasets through inter-annotator agreement analysis. Our work provides insights and presents open challenges in designing improved data life cycles.
Xavier Pleimling, Vedant Shah, Ismini Lourentzou:[Data] Quality Lies In The Eyes Of The Beholder. PETRA 2022: 118-124
- Date of publication:
- July 11, 2022
- PErvasive Technologies Related to Assistive Environments
- Page number(s):