Probabilistic Expression Recognition on Manifolds
Ya Chang, Matthew Turk
We present a probabilistic video-to-video facial expression recognition based on manifold of facial expression. The manifold of facial expression is embedded in high dimensional image space. In the embedded space, images with different expressions can be clustered and classified by the probabilistic model learned on the manifold of expression.
We propose the concept of Manifold of Facial Expression based on the observation that images of a subject's facial expressions define a smooth manifold in the high dimensional image space, like Fig.2. Such a manifold representation can provide a unified framework for facial expression analysis. To learn the structure of the manifold in the image space, we investigated two types of embeddings from a high dimensional space to a low dimensional space: locally linear embedding (LLE) and Lipschitz embedding. To reduce the variation due to scaling, illumination condition, and face pose, we first apply Active Wavelets Networks on the image sequences for face registration and facial feature localization. The typical facial feature localization results of different expression are shown in Fig. 3.
Our experiments show that LLE is suitable for visualizing expression manifolds (Fig. 4). After applying Lipschitz embedding, the expression manifold can be approximately considered as a super-spherical surface in the embedding space (Fig. 1 and 5). The training image sequences will be represented as "paths" on the expression manifold. The likelihood of one kind of facial expression is modified as a mixture density with exemplars as mixture centers. After training the probabilistic model on manifold of facial expression, we can perform facial expression recognition and prediction on probe set with little limitation (not necessarily begin or end with neutral expression, the transition between different expressions may not pass through a neutral expression).
For manifolds derived from different subjects, we propose a nonlinear alignment algorithm that keeps the semantic similarity of facial expression from different subjects on one generalized manifold. We also show that nonlinear alignment outperforms linear alignment in expression classification.
In the embedded space, a complete expression sequence becomes a path on the expression manifold, emanating from a center that corresponds to the neutral expression.
Each path consists of several clusters. A probabilistic model of transition between the clusters and paths is learned through training videos in the embedded space. The likelihood of one kind of facial expression is modeled as a mixture density with the clusters as mixture centers. The transition between different expressions is represented as the evolution of the posterior probability of the six basic paths. The experimental results demonstrate that the probabilistic approach can recognize expression transitions effectively.
We further extend our work to 3D data. We first build a 3D expression database to learn the expression space of a human face. The real-time 3D video data were captured by a camera/projector scanning system (by Marcelo Bernardes Vieira at IMPA). From this database, we extract the geometry deformation independent of pose and illumination changes. All possible facial deformations of an individual make a nonlinear manifold embedded in a high dimensional space. To combine the manifolds of different subjects that vary significantly and are usually hard to align, we transfer the facial deformations in all training videos to one standard model. To edit a facial expression of a new subject in 3D videos, the system searches over this generalized manifold for optimal replacement with the ‘target’ expression, which will be blended with the deformation in the previous frames to synthesize images of the new expression with the current head pose.
Y. Chang, C. Hu, M. Turk,
Y. Chang, C. Hu, M. Turk,
C. Hu, Y. Chang, R. Feris, M.
Chang, M. Vieira, M. Turk and L. Velho,
Y. Chang, C. Hu, R. Feris, M.