A. Lynn Abbott


We present a multi-modal feature fusion framework for Kinect-based Facial Expression Recognition (FER). The framework extracts and pre-processes 2D and 3D features separately. The types of 2D and 3D features are selected to maximize the accuracy of the system, with the Histogram of Oriented Gradient (HOG) features for 2D data and statistically selected angles for 3D data giving the best performance. The sets of 2D features and 3D features are reduced and later combined using a novel Dual Kernel Discriminant Analysis (DKDA) approach. Final classification is done using SVMs. The framework is benchmarked on a public Kinect-based FER dataset which includes data for 32 subjects (in both frontal and non-frontal poses and two expression intensities) and 6 basic expressions (plus neutral), namely: happiness, sadness, anger, disgust, fear, and surprise. The framework shows that the proposed combination of 2D and 3D features outperforms simpler existing combinations of 2D and 3D features, as well as systems that use either 2D or 3D features only. The proposed system also outperforms Linear Discriminant Analysis (LDA)-transformed and traditional Kernel Discriminant Analysis (KDA)-transformed systems, with an average accuracy improving of 10%. It also outperforms the state of the art by more than 13% in frontal poses.


A. Lynn Abbott

Publication Details

Date of publication:
March 7, 2016