Active Appearance Models (AAMs) are generative parametric models commonly used to track faces non-rigidly in video. AAMs are normally constructed by applying Procrustes analysis followed by Principal Components Analysis (PCA) to a collection of training images of faces with a mesh of canonical feature points (usually hand-marked) on them. AAMs are then fit frame-by frame to input videos to track the face through the video. The best fit model parameters are then used in whatever the chosen application is. A variety of video applications are possible, including dynamic head pose and gaze estimation for real-time user interfaces, expression recognition, and lip-reading.
In many scenarios there is the opportunity for occlusion. The occlusion may occur in the training data used to construct the AAM, and/or in the input videos to which the AAM is fit. Perhaps the most common cause of occlusion is 3D pose variation, which often causes self-occlusion. Other causes of occlusion include sunglasses or any objects placed in front of the face. Since occlusion is so common, it is important to be able to:
(1) Construct AAMs from occluded training images, and
(2) Efficiently fit AAMs to novel videos containing occlusion.
Image analysis is a general problem that can be tackled in various ways. This analysis is fundamental and essential to many processes such as industrial inspection, motion analysis, face recognition and medical image understanding. What makes this problem intrinsically hard is the inability to take into account single pixels independently to infer the structure they form together. The goal of such analysis is not only to solve the problem correctly, but also to do so efficiently, in a way that is not overly affected by the size of the image, i.e. the scale of the problem.
Analysis often involves measurement of meaningful structures in an image and possibly some explanation regarding the form of these structures. In order to derive any useful information about a particular meaningful structure, image segmentation must first take place. Segmentation is concerned with the identification of certain regions of interest which may be characterized as belonging to the same object. By deriving to image into such regions, understanding of the nature of its constituent components can be gained.
This report concentrates on a top-down approach to analysis. This approach relies on a high-level abstraction of the visual attributes of one structure. Alternatively, and often more usefully, this abstraction can represent a collection of structures that together form another structure. The reason why such an approach is referred to as a top-down approach is that it bears some existing information that it attempts to fit to the problem posed. It makes Assumptions about the problem and is in some sense taking a preliminary overview on the structures in an image. The rest of this section will describe popular methods of top-down image analysis, but will focus on active appearance models on the expense of other, less relevant methods.