A dense flow-based framework for real-time object registration under compound motion
Tracking and measuring surface deformation while the object itself is also moving was a challenging, yet important problem in many video analysis tasks. For example, video-based facial expression recognition required tracking non-rigid motions of facial features without being affected by any rigid motions of the head. Presented is a generic video alignment framework to extract and characterize surface deformations accompanied by rigid-body motions with respect to a fixed reference (a canonical form). Also proposed is a generic model for object alignment in a Bayesian framework, and rigorously showed that a special case of the model results in a SIFT flow and optical flow based least-square problem. The proposed algorithm was evaluated on three applications, including the analysis of subtle facial muscle dynamics in spontaneous expressions, face image super-resolution, and generic object registration.
Integrating Social Grouping for Multi-target Tracking Across Cameras in a CRF Model
Tracking multiple targets across non-overlapping cameras aims at estimating the trajectories of all targets and maintaining their identity labels consistent while they move from one camera to another. Matching targets from different cameras can be very challenging, as there might be significant appearance variation and the blind area between cameras makes target’s motion less predictable. Unlike most of the existing methods that only focus on modeling appearance and spatio-temporal cues for inter-camera tracking, presented is a novel online learning approach that considers integrating high-level contextual information into the tracking system. The tracking problem was formulated using an online learned Conditional Random Field (CRF) model that minimized a global energy cost.
Selective experience replay in reinforcement learning for re-identification
Person reidentification has the problem of recognizing a person across non-overlapping camera views. Pose variations, illumination conditions, low resolution images, and occlusion were the main challenges encountered in reidentification. Due to the uncontrolled environment in which the videos were captured, people could appear in different poses and the appearance of a person could vary significantly. The walking direction of a person provided a good estimation of their pose. Therefore, proposed is a reidentification system which adaptively selected an appropriate distance metric based on context of walking direction using reinforcement learning. Though experiments, it was showed that such a dynamic strategy outperformed static strategy learned or designed offline
Group structure preserving pedestrian tracking in a multi-camera Video network
Pedestrian tracking in video has been a popular research topic with many practical applications. In order to improve tracking performance, many ideas have been proposed, among which the use of geometric information is one of the most popular directions in recent research. Proposed is a novel multi-camera pedestrian tracking framework which incorporates the structural information of pedestrian groups in the crowd. In this framework, firstly, a new cross-camera model is proposed which enables the fusion of the confidence information from all camera views. Secondly, the group structures on the ground plane provide extra constraints between pedestrians. Thirdly, the structured SVM is adopted to update the cross camera model for each pedestrian according to the most recent tracked location.
Sparse representation matching for person re-identification
Person re-identification aims at matching people in non-overlapping cameras at different time and locations. To address this multi-view matching problem, we first learn a subspace using canonical correlation analysis (CCA) in which the goal is to maximize the correlation between data from different cameras but corresponding to the same people. Given a probe from one camera view, we represented it using a sparse representation from a jointly learned coupled dictionary in the CCA subspace. The 1 induced sparse representation were regularized by a 2 regularization term. The introduction of 2 regularization allowed learning a sparse representation while maintaining the stability of the sparse coefficients. To compute the matching scores between probe and gallery, their 2 regularized sparse representations were matched using a modified cosine similarity measure.
Grouping model for people tracking in surveillance camera
Person tracking and analysis from images or videos captured by closed-circuit television (CCTV) played an important role in forensics applications. Described is a tracking model by group analysis. This framework improved over state-of-the-art tracking algorithms by leveraging an online learned social grouping behavior model. This group model was practical in real-world applications where group changes (e.g., merge and split) were natural among pedestrians.
Tracking People by Evolving Social Groups: An Approach with Social Network Perspective
We address the problem of multi-people tracking in unconstrained and semi-crowded scenes. Instead of seeking more robust appearance or motion models to track each person as an isolated moving entity, we pose the multi-people tracking problem as a group-based tracklets association problem using the discovered social groups of tracklets as the contextual cues. We formulate tracking the evolution of social groups of tracklets as detecting closely connected communities in a “tracklet interaction network” (TIN) with nodes standing for the tracklets and edges denoting the spatio-temporal co-occurrence correlations measured by the edge weights. We incorporate the detected social groups in the tracklet interaction network to improve multi-people tracking performance.
Multi-camera Pedestrian Tracking using Group Structure
Proposed is a novel multi-camera pedestrian tracking system which incorporates a pedestrian grouping strategy and an online cross-camera model. The new cross-camera model is able to take the advantage of the information from all camera views as well as the group structure in the inference stage, and can be updated based on the learning approach from structured SVM. The experimental results demonstrate the improvement in tracking performance when grouping stage is integrated.
Person Re-Identification with Reference Descriptor
A reference-based method is proposed for person re-identification across different cameras. The matching is conducted in reference space where the descriptor for a person is translated from the original color or texture descriptors to similarity measures between this person and the exemplars in the reference set. A subspace is learned in which the correlations of the reference data from different cameras are maximized using Regularized Canonical Correlation Analysis (RCCA). For re-identification, the gallery data and the probe data are projected into this RCCA subspace and the reference descriptors (RDs) of the gallery and probe are generated by computing the similarity between them and the reference data.
Person Re-Identification by Robust Canonical Correlation Analysis
Due to significant view and pose change across non-overlapping cameras, directly matching data from different views is a challenging issue to solve. Proposed is a robust canonical correlation analysis (ROCCA) to match people from different views in a coherent subspace. Given a small training set, direct application of canonical correlation analysis (CCA) may lead to poor performance due to the inaccuracy in estimating the data covariance matrices. The proposed ROCCA with shrinkage estimation and smoothing technique is simple to implement and can robustly estimate the data covariance matrices with limited training samples.
Multitarget Tracking in Nonoverlapping Cameras Using a Reference Set
Tracking multiple targets in nonoverlapping cameras is challenging since the observations of the same targets are often separated by time and space. There might be significant appearance change of a target across camera views caused by variations in illumination conditions, poses, and camera imaging characteristics. Consequently, the same target may appear very different in two cameras, therefore, associating tracks in different camera views directly based on their appearance similarity is difficult and prone to error. In this paper, a novel reference set based appearance model is proposed to improve multitarget tracking in a network of nonoverlapping cameras.
Understanding Dynamic Social Grouping Behaviors of Pedestrians
Inspired by sociological models of human collective behavior, presented is a framework for characterizing hierarchical social groups based on evolving tracklet interaction network (ETIN) where the tracklets of pedestrians are represented as nodes and the their grouping behaviors are captured by the edges with associated weights. We use non-overlapping snapshots of the interaction network and develop the framework for a unified dynamic group identification and tracklet association. The approach is evaluated quantitatively and qualitatively on videos of pedestrian scenes where manually labeled ground-truth is given.
Analysis-by-synthesis: Pedestrian tracking with crowd simulation models in a multi-camera video network
A multi-camera tracking system with integrated crowd simulation is proposed in order to explore the possibility to make homography information more helpful. Two crowd simulators with different simulation strategies are used to investigate the influence of the simulation strategy on the final tracking performance. The performance is evaluated by multiple object tracking precision and accuracy (MOTP and MOTA) metrics, for all the camera views and the results obtained under real-world coordinates.
Reference-Based Person Re-Identification
Person re-identification refers to recognizing people
across non-overlapping cameras at different times and locations.
Due to the variations in pose, illumination condition,
background, and occlusion, person re-identification
is inherently difficult. We propose a reference based
method for across camera person re-identification. In
the training, we learn a subspace in which the correlations
of the reference data from different cameras are maximized
using Regularized Canonical Correlation Analysis (RCCA).
For re-identification, the gallery data and the probe data
are projected into the RCCA subspace and the reference descriptors
(RDs) of the gallery and probe are constructed by
measuring the similarity between them and the reference
data. The identity of the probe is determined by comparing
the RD of the probe and the RDs of the gallery. Experiments
on benchmark dataset show that the proposed method outperforms
the state-of-the-art approaches.
CAOS: A Hierarchical Robot Control System
Control systems which enabled robots to behave intelligently was a major issue in the process for automating factories. A hierarchical robot control system, termed CAOS for Control using Action Oriented Schemas with ideas taken from the neurosciences is presented. We used action oriented schemas (called neuroschemas) as the basic building blocks in a hierarchical control structure which was being implemented on a BBN Butterfly Parallel Processor. Serial versions in C and LISP are presented with examples showing how CAOS achieved the goals of recognizing three dimensional polyhedral objects.
Knowledge Based Robot Control on a Multiprocessor in a Multisensor Environment
Knowledge based robot control for automatic inspection, manipulation, and assembly of objects was projected to be a common denominator in highly automated factories. These tasks were to be handled routinely by intelligent, computer-controlled robots with multiprocessing and multi-sensor features which contribute to flexibility and adaptability. Discussed is the work with CAOS which was a knowledge based robot control system. The structure and components of CAOS were modeled after the human brain using neuroschemata at the basic building blocks which incorporated parallel processing, hierarchical, and heterarchical control.
Hierarchical Robot Control in a Multi-Sensor Environment
Automatic recognition, inspection, manipulation, and assembly of objects was projected to be a common denominator in highly automated factories. These tasks were to be handled routinely by intelligent, computer-controlled robots with multiprocessing and multi-sensor features which contribute to flexibility and adaptability.The control of a robot in such a multisensor environment became of crucial importance as the complexity of the problem grew exponentially with the number of sensors, tasks, commands, and objects. An approach which uses CAD (Computer Aided Design) based geometric and functional models of objects together with action oriented neuroschemas to recognize and manipulate objects by a robot in a multisensor environment is presented.
A Framework for Distributed Sensing and Control
Logical Sensor Specification (LSS) was introduced as a convenient means for specifying multi-sensor systems and their implementations. We demonstrated how control issues could be handled in the context of LSS. In particular the Logical Sensor Specification was extended to include a control mechanism which permitted control information to flow from more centralized processing to more peripheral processes, and be generated locally in the logical sensor by means of a micro-expert system specific to the interface represented by the given logical sensor.
The Synthesis of Logical Sensor Specifications
A coherent automated manufacturing system needed to include CAD/CAM, computer vision, and object manipulation but most systems which supported CAD/CAM did not provide for vision or manipulation and similarly, vision and manipulation systems incorporated no explicit relation to CAD/CAM models. CAD/CAM systems emerged which allowed the designer to conceive and model an object and automatically manufacture the object to the prescribed specifications. If recognition or manipulation was to be performed, existing vision systems relied on models generated in an ad hoc manner for the vision or recognition process. Although both Vision and CAD/CAM systems relied on models of the objects involved, different modeling schemes were used in each case. A more unified system allowed vision models to be generated from the CAD database. The model generation was guided by the class of objects being constructed, the constraints imposed by the robotic workcell environment (fixtures, sensors, manipulators, and effectors). We proposed a framework in which objects were designed using an existing CAGD system and logical sensor specifications were automatically synthesized and used for visual recognition and manipulation.
ASP: An Algorithm and Sensor Performance Evaluation System
Described is a methodology which permitted the precise characterization of sensors, the specification of algorithms which transformed the sensor data, and the quantitative analysis of combinations of algorithms and sensors. Such analysis made it possible to determine appropriate sensor/algorithm combinations subject to a wide variety of criteria including: performance, computational complexity (both space and time), possibility for concurrency, modularization, and the use of multi-sensor systems for greater fault tolerance and reliability.
Distributed Control in the Multi-Sensor Kernel System
The Multi-Sensor Kernel System (MKS) has been introduced as a convenient mechanism for specifying multi-sensor systems and their implementations. We demonstrated how control issues could be handled in the context of MKS. In particular, the Logical Sensor Specification was extended to include a control mechanism which permitted control information to flow from more centralized processing to more peripheral processes and be generated locally in the logical sensor by means of a micro-expert system specific to the interface represented by the given logical sensor.