Visualization and Intelligent Systems Laboratory



Contact Information

Winston Chung Hall Room 216
University of California, Riverside
900 University Avenue
Riverside, CA 92521-0425

Tel: (951)-827-3954

Bourns College of Engineering
NSF IGERT on Video Bioinformatics

UCR Collaborators:

Other Collaborators:
Keio University

Other Activities:
IEEE Biometrics Workshop 2014
IEEE Biometrics Workshop 2013
Worshop on DVSN 2009
Multibiometrics Book

Webmaster Contact Information:
Michael Caputo

Last updated: June 15, 2016



Registration and Fusion

Improving Action Units Recognition Using Dense Flow-based Face Registration in Video

Aligning faces with non-rigid muscle motion in the real-world streaming video is a challenging problem. We propose a novel automatic video-based face registration architecture for facial expression recognition. The registration process is formulated as a dense SIFT-flow- and optical-flow- based affine warping problem. We start off by estimating the transformation of an arbitrary face to a generic reference face with canonical pose. This initialization in our framework establishes a head pose and person independent face model. The affine transformation computed from the initialization is then propagated by affine transformation estimated from the dense optical flow to guarantee the temporal smoothness of the non- rigid facial appearance. We call this method SIFT and optical flow affine image transform (SOFAIT). This real-time algorithm is designed for realistic streaming data, allowing us to analyze the facial muscle dynamics in a meaningful manner. Visual and quantitative results demonstrate that the proposed automatic video-based face registration technique captures the appearance changes in spontaneous expressions and outperforms the state- of-the-art technique.

Fusion of Multiple Trackers in Video Networks

We address the camera selection problem by fusing the performance of multiple trackers. Currently, all the camera selection/hand-off approaches largely depend on the performance of the tracker deployed to decide when to hand-off from one camera to another. However, a slight inaccuracy of the tracker may pass the wrong information to the system such that the wrong camera may be selected and error may be propagated. We present a novel approach to use multiple state-of-the-art trackers based on different features and principles to generate multiple hypotheses and fuse the performance of multiple trackers for camera selection. The proposed approach has very low computational overhead and can achieve real-time performance. We perform experiments with different numbers of cameras and persons on different datasets to show the superior results of the proposed approach. We also compare results with a single tracker to show the merits of integrating results from multiple trackers.

Ethnicity Classification Based on Gait Using Multi-view Fusion

The determination of ethnicity of an individual, as a soft biometric, can be very useful in a video-based surveillance system. Currently, face is commonly used to determine the ethnicity of a person. Up to now, gait has been used for individual recognition and gender classification but not for ethnicity determination. This research focuses on the ethnicity determination based on fusion of multi-view gait. Gait Energy Image (GEI) is used to analyze the recognition power of gait for ethnicity. Feature fusion, score fusion and decision fusion from multiple views of gait are explored. For the feature fusion, GEI images and camera views are put together to render a third-order tensor (x; y; view). A multilinear principal component analysis (MPCA) is used to extract features from tensor objects which integrate all views. For the score fusion, the similarity scores measured from single views are combined with a weighted SUM rule. For the decision fusion, ethnicity classification is realized on each individual view first. The classification results are then combined to make the final determination with a majority vote rule. A database of 36 walking people (East Asian and South American) was acquired from 7 different camera views. The experimental results show that ethnicity can be determined from human gait in video automatically. The classification rate is improved by fusing multiple camera views and a comparison among different fusion schemes shows that the MPCA based feature fusion performs the best.

Feature Fusion of Face and Gait for Human Recognition at a Distance in Video

A new video based recognition method is presented to recognize non-cooperating individuals at a distance in video, who expose side views to the camera. Information from two biometric sources, side face and gait, is utilized and integrated at feature level. For face, a high-resolution side face image is constructed from multiple video frames. For gait, Gait Energy Image (GEI), a spatio-temporal compact representation of gait in video, is used to characterize human walking properties. Face features and gait features are obtained separately using Principal Component Analysis (PCA) and Multiple Discriminant Analysis (MDA) combined method from the high-resolution side face image and Gait Energy Image (GEI), respectively. The system is tested on a database of video sequences corresponding to 46 people. The results showed that the integrated face and gait features carry the most discriminating power compared to any individual biometric.

Global-to-Local Non-Rigid Shape Registration

Non-rigid shape registration is an important issue in computer vision. We present a novel global-to- local procedure for aligning non-rigid shapes. The global similarity transformation is obtained based on the corresponding pairs found by matching shape context descriptors. The local deformation is performed within an optimization formulation, in which the bending energy of thin plate spline transformation is incorporated as a regularization term to keep the structure of the model shape preserved under the shape deformation. The optimization procedure drives the initial global registration towards the target shape that results in the one-to-one correspondence between the model and target shape. Experimental results demonstrate the effectiveness of the proposed approach.

Moving Humans Detection Based on Multi-modal Sensor Fusion

Moving object detection plays an important role in automated surveillance systems. However, it is challenging to detect moving objects robustly in a cluttered environment. We propose an approach for detecting humans using multi-modal measurements. The approach is based on using Time-Delay Neural Netwrok (TDNN) to fuse the audio and video data at the feature level for detecting the walker with multiple persons in the scene. The main contribution of this research is the introduction of Time-Delay Neural Network in learning the relation between visual motion and step sounds of the walking person. Experimental results are presented.

Adaptive Fusion for Diurnal Moving Object Detection

Fusion of different sensor types (e.g. video, thermal infrared) and sensor selection strategy at signal or pixel level is a non-trivial task that requires a well-defined structure. We provide a novel fusion architecture that is flexible and can be adapted to different types of sensors. The new fusion architecture provides an elegant approach to integrating different sensing phenomenology, sensor readings, and contextual information. A cooperative coevolutionary method is introduced for optimally selecting fusion strategies. We provide results in the context of a moving object detection system for a full 24 hours diurnal cycle in an outdoor environment. The results indicate that our architecture is robust to adverse illumination conditions and the evolutionary paradigm can provide an adaptable and flexible method for combining signals of different modality.

Physics-based Cooperative Sensor Fusion for Moving Object Detection

A robust moving object detection system for an outdoor scene must be able to handle adverse illumination conditions such as sudden illumination changes or lack of illumination in a scene. This is of particular importance for scenarios where active illumination cannot be relied upon. Utilizing infrared and video sensors, we propose a novel sensor fusion algorithm that automatically adapts to the environmental changes that effect sensor measurements. The adaptation is done through a new cooperative coevolutionary algorithm that fuses the scene contextual and statistical information through a physics-based method. Our sensor fusion algorithm maintains high detection rates under a variety of conditions and sensor failure. The results are shown for a full 24 hour diurnal cycle.