Virginia Tech® home

Explainable Driver Activity Recognition Using Video Transformer in Highly Automated Vehicle

Abhijit Sarkar, Hirva Bhagat

Abstract

Distracted driving is one of the leading causes of road accidents. With the recent introduction of advanced driver assistance systems and L2 vehicles, the role of driver attention has gained renewed interest. It is imperative for vehicle manufacturers to develop robust systems that can identify distractions and aid in preventing such accidents in highly automated vehicles. This paper mainly focuses on studying secondary behaviors, and their relative complexity to develop a guide for auto manufacturers. In recent years, a few driver secondary action datasets and deep learning algorithms have been created to address this problem. Despite their success in many domains, Convolutional Neural Network based deep learning methods struggle to fully consider the overall context of an image, and focus on specific image features. We present the use of Video Transformers on two challenging datasets, one of them being a grayscale low-quality dataset. We also demonstrate how the novel concept of a Visual Dictionary can be used to understand the structural components of any secondary behavior. Finally, we validate different components of the visual dictionary by studying the attention modules of the transformer-based model and incorporating explainability in the computer vision model. An activity is decomposed into multiple small actions and attributes and the corresponding attention patches are highlighted in the input frame. Our code is available at github.com/VTTI/driver-secondary-action-recognition.

Publication Details

Date of publication: July 26, 2023

Conference: IEEE Symposium on Intelligent Vehicle

Page number(s): 1-8

Volume:

Issue Number:

Publication Note: Akash Sonth, Abhijit Sarkar, Hirva Bhagat, A. Lynn Abbott: Explainable Driver Activity Recognition Using Video Transformer in Highly Automated Vehicle. IV 2023: 1-8