Visualization and Intelligent Systems Laboratory
VISLab

 

 

Contact Information

VISLab
Winston Chung Hall Room 216
University of California, Riverside
900 University Avenue
Riverside, CA 92521-0425


Tel: (951)-827-3954

CRIS
Bourns College of Engineering
UCR
NSF IGERT on Video Bioinformatics

UCR Collaborators:
CSE
ECE
ME
STAT
PSYC
ENTM
BIOL
BPSC
ECON
MATH
BIOENG
MGNT

Other Collaborators:
Keio University

Other Activities:
IEEE Biometrics Workshop 2014
IEEE Biometrics Workshop 2013
Worshop on DVSN 2009
Multibiometrics Book

Webmaster Contact Information:
Alex Shin
wshin@ece.ucr.edu

Last updated: July 1, 2017

 

 

Face

A Reference-Based Framework for Pose Invariant Face Recognition

The similarity between a face image and a set of reference individuals defines the reference-based descriptor for a face image. Recognition is performed using the reference-based descriptors of probe and gallery images. The dimensionality of the face descriptor generated by the accompanying face recognition algorithm is reduced to the number of individuals in the reference set. The proposed framework is a generalization of previous recognition methods that use indirect similarity and referencebased descriptors.

Reference Face Graph for Face Recognition

Angelina Jolie. Face recognition has been studied extensively; however, real-world face recognition still remains a challenging task. We approach face recognition in the context of graph theory. We recognize an unknown face using an external Reference Face Graph (RFG). A RFG is generated and recognition of a given face is achieved by comparing it to the faces in the constructed RFG. The results show that the proposed approach outperforms the state-of-the-art methods.

Discrete Cosine Transform Locality-Sensitive Hashes for Face Retrieval

Sample images from LFW [44],FERET [40], RaFD [42], BioID [43],FEI [41], and Multi-PIE [39] Searching large databases using local binary patterns for face recognition has been problematic due to the cost of the linear search, and the inadequate performance of existing indexing methods. We present Discrete Cosine Transform (DCT) hashing for creating index structures for face descriptors. Hashes play the role of keywords: an index is created, and queried to find the images most similar to the query image. It is shown here that DCT hashing has significantly better retrieval accuracy and it is more efficient compared to other popular state-of-the-art hash algorithms.

Unconstrained Face Recognition in Surveillance Camera Networks

Subject’s face is captured by three cameras from different views in atypicalsurveillance camerasystem setup Achieving good performance in surveillance videos with unconstrained faces is inherently difficult. We aim at tackling this unconstrained face recognition problem and utilizing multiple cameras to improve the recognition accuracy using a probabilistic approach. The proposed method is tested on a public surveillance video dataset with a three-camera setup. We compare our method to different benchmark classifiers with various feature descriptors. The results demonstrate that by modeling the face in a dynamic manner the recognition performance in a multi-camera network is improved and the recognition result is better than using any of the single cameras.

Facial Emotion Recognition with Anisotropic Inhibited Gabor Energy Histograms

State-of-the-art approaches have yet to deliver a feature representation for facial emotion recognition that can be applied to non-trivial unconstrained, continuous video data sets. Initially, research advanced with the use of Gabor energy filters. However, in recent work more attention has been given to other features. Gabor energy filters lack generalization needed in unconstrained situations. Additionally, they result in an undesirably high feature vector dimensionality. Nontrivial data sets have millions of samples; feature vectors must be as low dimensional as possible. We propose a novel texture feature based on Gabor energy filters that offers generalization with a background texture suppression component and is as compact as possible due to a maximal response representation and local histograms. We improve performance on the non-trivial Audio/Visual Emotion Challenge 2012 grandchallenge data set.

Improving Action Units Recognition in Video

Results using SOFAIT registration technique. We have developed a novel automatic video-based face registration architecture for facial expression recognition. We start off by estimating the transformation of an arbitrary face to a generic reference face with canonical pose. This initialization in our framework establishes a head pose and person independent face model. The affine transformation computed from the initialization is then propagated by affine transformation estimated from the dense optical flow to guarantee the temporal smoothness of the non-rigid facial appearance. We call this method SIFT and optical flow affine image transform (SOFAIT). Visual and quantitative results demonstrate that the proposed automatic video-based face registration technique captures the appearance changes in spontaneous expressions and outperforms the state- of-the-art technique.

Continuous Facial Emotion Recognition

Two different segments of AVEC development video. Affective computing -- the emergent field in which computers detect emotions and project appropriate expressions of their own -- has reached a bottleneck where algorithms are not able to infer a person’s emotions from natural and spontaneous facial expressions captured in video. We propose a principled method which addresses the temporal dynamics of facial emotions and expressions in video with a sampling approach inspired from human perceptual psychology. The method shows an average improvement of 9.8% over the baseline for weighted accuracy on the Audio/Visual Emotion Challenge 2011 video-based frame level sub-challenge testing set.

Improving Action Units Recognition Using Dense Flow-based Face Registration in Video

Aligning faces with non-rigid muscle motion in the real-world streaming video is a challenging problem. We propose a novel automatic video-based face registration architecture for facial expression recognition. The registration process is formulated as a dense SIFT-flow- and optical-flow- based affine warping problem. We start off by estimating the transformation of an arbitrary face to a generic reference face with canonical pose. This initialization in our framework establishes a head pose and person independent face model. The affine transformation computed from the initialization is then propagated by affine transformation estimated from the dense optical flow to guarantee the temporal smoothness of the non- rigid facial appearance. We call this method SIFT and optical flow affine image transform (SOFAIT). This real-time algorithm is designed for realistic streaming data, allowing us to analyze the facial muscle dynamics in a meaningful manner. Visual and quantitative results demonstrate that the proposed automatic video-based face registration technique captures the appearance changes in spontaneous expressions and outperforms the state- of-the-art technique.

Face Image Super-Resolution using 2D CCA

Some extracted LR faces (small images) and the super-resolved faces (large images) We have decveloped a face super-resolution method using two-dimensional canonical correlation analysis (2D CCA) is presented. A detail compensation step is followed to add high-frequency components to the reconstructed high-resolution face. In our approach the relationship between the high-resolution and the low-resolution face image are maintained in their original 2D representation. Different parts of a face image are super-resolved separately to better preserve the local structure. The proposed method is compared with various state-of-the-art super-resolution algorithms. The method is very efficient in both the training and testing phases compared to the other approaches.

Understanding Discrete Facial Expressions in Video Using an EAI

Sample sequence for posed data. Very little head motion is involved. Existing video-based facial expression recognition techniques analyze the geometry-based and appearance-based information in every frame as well as explore the temporal relation among frames. On the contrary, we present a new image-based representation and an associated reference image called the emotion avatar image (EAI), and the avatar reference, respectively. The approach to facial expression analysis consists of the following steps: 1) face detection; 2) face registration of video frames with the avatar reference to form the EAI representation; 3) computation of features from EAIs using both local binary patterns and local phase quantization; and 4) the classification of the feature as one of the emotion type by using a linear support vector machine classifier. The experimental results demonstrate that the information captured in an EAI for a facial expression is a very strong cue for emotion inference.

Facial Emotion Recognition With Expression Energy

FPLBP encodes large edges Facial emotion recognition in unconstrained settings is a typical case where algorithms perform poorly. A property of the AVEC2012 data set is that individuals in testing data are not encountered in training data. In these situations, conventional approaches suffer because models developed from training data cannot properly discriminate unforeseen testing samples. We propose two similarity metrics that address the problems of a conventional approach: neutral similarity, measuring the intensity of an expression; and temporal similarity, measuring changes in an expression over time. These similarities are taken to be the energy of facial expressions. Our method improves correlation by 35.5% over the baseline approach on the frame-level sub-challenge.

Cluster-Classification Bayesian Networks for Head Pose Estimation

Head pose estimation is critical in many applications such as face recognition and human-computer interaction. Various classifiers such as LDA, SVM, or nearest neighbor are widely used for this purpose; however, the recognition rates are limited due to the limited discriminative power of these classifiers for discretized pose estimation. We propose a head pose estimation method using a Cluster-Classification Bayesian Network (CCBN), specifically designed for classification after clustering. A pose layout is defined where similar poses are assigned to the same block. This increases the discriminative power within the same block when similar yet different poses are present. We achieve the highest recognition accuracy on two public databases (CAS-PEAL and FEI) compared to the stateof- the-art methods.

Face Recognition in Multi-Camera Surveillance Videos

Recognizing faces in surveillance videos becomes difficult due to the poor quality of the probe data in terms of resolution, noise, blurriness, and varying light- ing conditions. In addition, the poses of probe data are usually not frontal view, contrary to the standard for- mat of the gallery data. The discrepancy between the two types of the data makes the existing recognition al- gorithm less accurate in real-world data. In this pa- per, we propose a multi-camera video based face recog- nition framework using a novel image representation called Unified Face Image (UFI), which is synthesized from multiple camera feeds. Within a temporal window the probe frames from different cameras are warped to- wards a template frontal face and then averaged. The generated UFI is a frontal view of the subject that in- corporates information from different cameras. We use SIFT flow as a high level alignment tool to warp the faces. Experimental results show that by using the fused face, the recognition performance is better than the re- sult of any single camera. The proposed framework can be adapted to any multi-camera video based recogni- tion method using any feature descriptors or classifiers.

Face Recognition in Multi-Camera Surveillance Videos using Dynamic Bayesian Network

Face recognition in surveillance videos is inherently difficult due to the limitation of the camera hardware as well as the image acquisition process in which non-cooperative subjects are recorded in arbitrary poses and resolutions in different lighting conditions with noise and blurriness. Furthermore, as multiple cameras are usually distributed in a camera network and the subjects are moving, different cameras often capture the subject in different views. We propose a probabilistic approach for face recognition suitable for a multi-camera video surveillance network. A Dynamic Bayesian Network (DBN) is used to incorporate the information from different cameras as well as the temporal clues from consecutive frames. The proposed method is tested on a public surveillance video dataset. We compare our method to different well-known classifiers with various feature descriptors. The results demonstrate that by modeling the face in a dynamic manner the recognition performance in a multi-camera network can be improved.

A Biologically Inspired Approach for Fusing Facial Expression and Appearance for Emotion Recognition

Facial emotion recognition from video is an exemplar case where both humans and computers underperform. In recent emotion recognition competitions, top approaches were using either geometric relationships that best captured facial dynamics or an accurate registration technique to develop appearance features. These two methods capture two different types of facial information similarly to how the human visual system divides information when perceiving faces. We propose a biologically-inspired fusion approach that emulates this process. The efficacy of the approach is tested with the Audio/Visual Emotion Challenge 2011 data set, a non-trivial data set where state-of-the-art approaches perform under chance. The proposed approach increases classification rates by 18.5% on publicly available data.

A Psychologically-Inspired Match-Score Fusion Model for Video-Based Facial Expression Recognition

Communication between humans is rich in complexity and is not limited to verbal signals; emotions are conveyed with gesture, pose and facial expression. Facial Emotion Recognition and Analysis (FERA), the set of techniques by which non-verbal communication is quantified, is an exemplar case where humans consistently outperform computer methods. While the field of FERA has seen many advances, no system has been proposed which scales well to very large data sets. The challenge for computer vision is how to automatically and non-heuristically downsample the data while maintaining a minimum representational power that does not sacrifice accuracy. We propose a method inspired by human vision and attention theory. Video is segmented into temporal partitions with a dynamic sampling rate based on the frequency of visual information. Regions are homogenized by an experimentally selected match-score fusion technique. The approach is shown to increase classification rates by over baseline with the AVEC 2011 video-subchallenge.

Facial Expression Recognition Using Emotion Avatar Image

Existing facial expression recognition techniques analyze the spatial and temporal information for every single frame in a human emotion video. On the contrary, we create the Emotion Avatar Image (EAI) as a single good representation for each video or image sequence for emotion recognition. In this paper, we adopt the recently introduced SIFT flow algorithm to register every frame with respect to an Avatar reference face model. Then, an iterative algorithm is used not only to superresolve the EAI representation for each video and the Avatar reference, but also to improve the recognition performance. Subsequently, we extract the features from EAIs using both Local Binary Pattern (LBP) and Local Phase Quantization (LPQ). Then the results from both texture descriptors are tested on the Facial Expression Recognition and Analysis Challenge (FERA2011) data, GEMEP-FERA dataset. To evaluate this simple yet powerful idea, we train our algorithm only using the given 155 videos of training data from GEMEP-FERA dataset. The result shows that our algorithm eliminates the personspecific information for emotion and performs well on unseen data.

Face Recognition in Video with Closed-Loop Super-resolution

Video-based face recognition has received significant attention in the past few years. However, the facial images in a video sequence acquired from a distance are usually small in size and their visual quality is low. Enhancing low- resolution (LR) facial images from a video sequence is of importance for performing face recognition. Registration is a critical step in super-resolution (SR) of facial images from a video which requires precise pose alignment and illumination normalization. Unlike traditional approaches that perform tracking for each frame before using a SR method, we present an incremental super-resolution technique in which SR and tracking are linked together in a closed-loop system. An incoming video frame is first registered in pose and normalized for illumination, and then combined with the existing super-resolved texture. This super-resolved texture, in turn, is used to improve the estimate of illumination and motion parameters for the next frame. This process passes on the benefits of the SR result to the tracking module and allows the entire system to reach its potential. We show results on a low-resolution facial video. We demonstrate a significant improvement in face recognition rates with the super-resolved images over the images without super-resolution.

Evaluating the Quality of Super-resolved Images for Face Recognition

The widespread use of super-resolution methods, in a variety of applications such as surveillance has led to an increasing need for or quality assessment measures. The current quality measures aim to compare different fusion methods by assessing the quality of the fused images. They consider the information transferred between the super-resolved image and input images only. In this paper, we propose an objective quality evaluation algorithm for super-resolved images, which focuses on evaluating the quality of super-resolved images that are constructed from different conditions of input images. The proposed quality evaluation method combines both the relationship between the super-resolved image and the input images, and the relationship between the input images. Using the proposed measure, the quality of the super-resolved face images constructed from videos are evaluated under different conditions, including the variation of pose, lighting, facial expressions and the number of input images.

Super-resolution of Facial Images in Video with Expression Changes

Super-resolution (SR) of facial images from video suffers from facial expression changes. Most of the existing SR algorithms for facial images make an unrealistic assumption that the “perfect” registration has been done prior to the SR process. However, the registration is a challenging task for SR with expression changes. This research proposes a new method for enhancing the resolution of low-resolution (LR) facial image by handling the facial image in a nonrigid manner. It consists of global tracking, local alignment for precise registration and SR algorithms. A B-spline based Resolution Aware Incremental Free Form Deformation (RAIFFD) model is used to recover a dense local nonrigid flow field. In this scheme, low-resolution image model is explicitly embedded in the optimization function formulation to simulate the formation of low resolution image. The results achieved by the proposed approach are significantly better as compared to the SR approaches applied on the whole face image without considering local deformations. The results are also compared with two state-ofthe- art SR algorithms to show the effectiveness of the approach in super-resolving facial images with local expression changes.

Super-Resolution of Deformed Facial Images in Video

Super-resolution (SR) of facial images from video suffers from facial expression changes. Most of the existing SR algorithms for facial images make an unrealistic assumption that the “perfect” registration has been done prior to the SR process. However, the registration is a challenging task for SR with expression changes. This research proposes a new method for enhancing the resolution of low-resolution (LR) facial image by handling the facial image in a non-rigid manner. It consists of global tracking, local alignment for precise registration and SR algorithms. A B-spline based Resolution Aware Incremental Free Form Deformation (RAIFFD) model is used to recover a dense local non-rigid flow field. In this scheme, low-resolution image model is explicitly embedded in the optimization function formulation to simulate the formation of low resolution image. The results achieved by the proposed approach are significantly better as compared to the SR approaches applied on the whole face image without considering local deformations.

Feature fusion of side face and gait for video-based human identification

Video-based human recognition at a distance remains a challenging problem for the fusion of multimodal biometrics. We present a new approach that utilizes and integrates information from side face and gait at the feature level. The features of face and gait are obtained separately using principal component analysis (PCA) from enhanced side face image (ESFI) and gait energy image (GEI), respectively. Multiple discriminant analysis (MDA) is employed on the concatenated features of face and gait to obtain discriminating synthetic features. The experimental results demonstrate that the synthetic features, encoding both side face and gait information, carry more discriminating power than the individual biometrics features, and the proposed feature level fusion scheme outperforms the match score level and another feature level fusion scheme.

Integrating Face and Gait for Human Recognition at a Distance in Video

We have introduced a new video-based recognition method to recognize noncooperating individuals at a distance in video who expose side views to the camera. Information from two biometrics sources, side face and gait, is utilized and integrated for recognition. For side face, an enhanced side-face image (ESFI), a higher resolution image compared with the image directly obtained from a single video frame, is constructed. For gait, the gait energy image (GEI), a spatiotemporal compact representation of gait in video, is used to characterize human-walking properties. The experimental results show that the idea of constructing ESFI from multiple frames is promising for human recognition in video, and better face features are extracted from ESFI compared to those from the original side-face images (OSFIs).

Evolutionary Feature Synthesis for Facial Expression Recognition

We present a novel genetically inspired learning method for facial expression recognition (FER). Our learning method can select visually meaningful features automatically in a genetic programming-based approach that uses Gabor wavelet representation for primitive features and linear/nonlinear operators to synthesize new features. To make use of random nature of a genetic program, we design a multi-agent scheme to boost the performance. We compare the performance of our approach with several approaches in the literature and show that our approach can perform the task of facial expression recognition effectively.

Feature Fusion of Face and Gait for Human Recognition at a Distance in Video

A new video based recognition method is presented to recognize non-cooperating individuals at a distance in video, who expose side views to the camera. Information from two biometric sources, side face and gait, is utilized and integrated at feature level. For face, a high-resolution side face image is constructed from multiple video frames. For gait, Gait Energy Image (GEI), a spatio-temporal compact representation of gait in video, is used to characterize human walking properties. Face features and gait features are obtained separately using Principal Component Analysis (PCA) and Multiple Discriminant Analysis (MDA) combined method from the high-resolution side face image and Gait Energy Image (GEI), respectively. The system is tested on a database of video sequences corresponding to 46 people. The results showed that the integrated face and gait features carry the most discriminating power compared to any individual biometric.

Super-resolution Restoration of Facial Images in Video

Reconstruction-based super-resolution has been widely treated in computer vision. However, super-resolution of facial images has received very little attention. Since different parts of a face may have different motions in normal videos, we propose a new method for enhancing the resolution of low-resolution facial image by handling the facial image non-uniformly. We divide low-resolution face image into different regions based on facial features and estimate motions of each of these regions using different motion models. Our experimental results show we can achieve better results than applying super-resolution on the whole face image uniformly.

Integrating Face and Gait for Human Recognition

We introduce a new video based recognition method to recognize non-cooperating individuals at a distance in video, who expose side views to the camera. Information from two biometric sources, side face and gait, is utilized and integrated for recognition. For side face, we construct Enhanced Side Face Image (ESFI), a higher resolution image compared with the image directly obtained from a single video frame, to fuse information of face from multiple video frames. For gait, we use Gait Energy Image (GEI), a spatio-temporal compact representation of gait in video, to characterize human walking properties. The features of face and the features of gait are obtained separately using Principal Component Analysis (PCA) and Multiple Discriminant Analysis (MDA) combined method from ESFI and GEI, respectively. They are then integrated at match score level. Our approach is tested on a database of video sequences corresponding to 46 people. The different fusion methods are compared and analyzed. The experimental results show that (a) Integrated information from side face and gait is effective for human recognition in video; (b) The idea of constructing ESFI from multiple frames is promising for human recognition in video and better face features are extracted from ESFI compared to those from original face images.

Human Recognition Based on Face Profiles in Video

Face profile is an important aspect of face recognition and it provides a complementary structure of the face that is seen in the non-frontal view. In the past, several methods have been proposed to recognize face profiles in still images. However, face profile images that are captured at a distance by surveillance cameras usually are video sequences that have a low resolution. It is difficult to extract accurate face profile directly from a low-resolution video frame, which does not have many pixels on the face profile. The emphasis of this research is to introduce a practical approach for human recognition by using high-resolution face profile images constructed from the low-resolution videos. We use both the spatial and temporal information present in a number of adjacent low-resolution frames of a video sequence to construct high-resolution face profile images. As the quality of high-resolution images relies on the correctness of image alignment between consecutive frames, an elastic registration algorithm is used for face profile image alignment. A match statistic is designed to detect and discard poorly aligned images which may degrade the quality of the high-resolution face profile image. After obtaining high-resolution face profile images, we use a dynamic time warping method for face profile recognition. A number of dynamic video sequences are tested to demonstrate the applicability and reliability of our method.

Face Recognition from Face Profile Using Dynamic Time Warping

Most of the current profile recognition algorithms depend on the correct detection of fiducial points and the determination of relationships among these fiducial points. Unfortunately, some features such as concave nose, protruding lips, flat chin, etc., make detection of such points difficult and unreliable. Also, the number and position of fiducial points vary when expression changes even for the same person. A curvature based matching approach is presented in this research, which does not require the extraction of all the fiducial points, but uses information contained in the profile. The scale space filtering is used to smooth the profile and then the curvature of the filtered profile is computed. Using the curvature value, the fiducial points, such as nasion and throat can be reliably extracted using a fast and simple method. Then a dynamic time warping method is applied to match the face profile portion from nasion to throat based on the curvature value. Experiments are performed on two profile face image databases. Recognition rates and conclusion are presented and discussed.

Face Recognition from Face Profile Using Dynamic Time Warping

Profile Images, University of Stirling: Speaking, Smiling and Neutral Expression. We present a curvature-based matching approach, which uses information contained in the facial profile. The scale space filtering is used to smooth the profile and then the curvature of the filtered profile is computed. Using the curvature value, the fiducial points, such as nasion and throat can be reliably extracted using a fast and simple method. Then a dynamic time warping method is applied. Experiments are performed on two profile face image databases.

Feature Synthesis Using Genetic Programming for Face Expression Recognition

We introduce a novel genetically-inspired learning method for face expression recognition (FER) in visible images. Unlike current research for FER that generally uses visually meaningful feature, we proposed a Genetic Programming based technique, which learns to discover composite operators and features that are evolved from combinations of primitive image processing operations. In this approach, the output of the learned composite operator is a feature vector that is used for FER. The experimental results show that our approach can find good composite operators to effectively extract useful features.