Biodiversity image repositories are crucial sources for training machine learning approaches to support biological research. Metadata about object (e.g. image) quality is a putatively important prerequisite to selecting samples for these experiments. This paper reports on a study demonstrating the importance of image quality metadata for a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found anatomical feature visibility was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support machine learning.
Jeremy Leipzig , Yasin Bakis , Xiaojun Wang , Mohannad Elhamod , Kelly Diamond , Wasila M. Dahdul , Anuj Karpatne , A. Murat Maga , Paula M. Mabee , Henry L. Bart , Jane Greenberg: Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species. MTSR 2020: 3-12
- Date of publication:
- March 18, 2021
- Metadata and Semantic Research
- Page number(s):