Probabilistic Expression Recognition on Manifolds

Ya Chang, Matthew Turk

Fig. 1. Lipschitz embedding. 3027 images of male subject


We present a probabilistic video-to-video facial expression recognition based on manifold of facial expression. The manifold of facial expression is embedded in high dimensional image space. In the embedded space, images with different expressions can be clustered and classified by the probabilistic model learned on the manifold of expression.


We propose the concept of Manifold of Facial Expression based on the observation that images of a subject's facial expressions define a smooth manifold in the high dimensional image space, like Fig.2. Such a manifold representation can provide a unified framework for facial expression analysis. To learn the structure of the manifold in the image space, we investigated two types of embeddings from a high dimensional space to a low dimensional space: locally linear embedding (LLE) and Lipschitz embedding. To reduce the variation due to scaling, illumination condition, and face pose, we first apply Active Wavelets Networks on the image sequences for face registration and facial feature localization. The typical facial feature localization results of different expression are shown in Fig. 3.

Our experiments show that LLE is suitable for visualizing expression manifolds (Fig. 4). After applying Lipschitz embedding, the expression manifold can be approximately considered as a super-spherical surface in the embedding space (Fig. 1 and 5). The training image sequences will be represented as "paths" on the expression manifold. The likelihood of one kind of facial expression is modified as a mixture density with exemplars as mixture centers. After training the probabilistic model on manifold of facial expression, we can perform facial expression recognition and prediction on probe set with little limitation (not necessarily begin or end with neutral expression, the transition between different expressions may not pass through a neutral expression).



Fig. 2: From "Manifold way of Perception", H. S. Seung, Science, 2000


Fig. 3: Some sample images with feature points


Fig. 4: The first 2 coordinates of LLE, 478 images, with representing images

For manifolds derived from different subjects, we propose a nonlinear alignment algorithm that keeps the semantic similarity of facial expression from different subjects on one generalized manifold. We also show that nonlinear alignment outperforms linear alignment in expression classification.


Fig. 5: Lipshcitz embedding. 1824 images of female subject


Fig. 6: Nonlinear alignment result of two manifolds. Circles are from manifold of the female subject, filled points are from manifold of the male subject.


In the embedded space, a complete expression sequence becomes a path on the expression manifold, emanating from a center that corresponds to the neutral expression.

 Each path consists of several clusters. A probabilistic model of transition between the clusters and paths is learned through training videos in the embedded space. The likelihood of one kind of facial expression is modeled as a mixture density with the clusters as mixture centers. The transition between different expressions is represented as the evolution of the posterior probability of the six basic paths. The experimental results demonstrate that the probabilistic approach can recognize expression transitions effectively.



Fig. 7: Facial expression recognition result with manifold visualization


We further extend our work to 3D data. We first build a 3D expression database to learn the expression space of a human face. The real-time 3D video data were captured by a camera/projector scanning system (by Marcelo Bernardes Vieira at IMPA). From this database, we extract the geometry deformation independent of pose and illumination changes. All possible facial deformations of an individual make a nonlinear manifold embedded in a high dimensional space. To combine the manifolds of different subjects that vary significantly and are usually hard to align, we transfer the facial deformations in all training videos to one standard model. To edit a facial expression of a new subject in 3D videos, the system searches over this generalized manifold for optimal replacement with the ‘target’ expression, which will be blended with the deformation in the previous frames to synthesize images of the new expression with the current head pose.


Fig. 8: System Diagram


Fig. 9: An example of 3D data viewer with fitted mesh


Fig. 10: Deformation transfer from training videos. The first row and second row is images of anger and the corresponding deformed standard mesh model. The first to the third column is one style of anger at frame 1, 6, and 29. The fourth to sixth column is another style of anger at frames 1, 20, and 48. The motions of the training videos are well retargeted on the standard model. 






Y. Chang, C. Hu, M. Turk,
Manifold of Facial Expression,
IEEE International Workshop on Analysis and Modeling of Faces and Gesture, Oct. 17, 2003. Nice, France.

Y. Chang, C. Hu, M. Turk,
Probabilistic expression analysis on manifolds,
International Conference on Computer Vision and Pattern Recognition, Washington DC, June 2004.

C. Hu, Y. Chang, R. Feris, M. Turk,
Learning facial deformation tracking and recognition on manifold,
IEEE Workshop on Face Processing in Video, Washington, D.C., June 2004.

Y. Chang, M. Vieira, M. Turk and L. Velho,
Automatic 3D Facial Expression Analysis in Videos,
IEEE International Workshop on Analysis and Modeling of Faces and Gestures, Beijing, 2005.

Y. Chang, C. Hu, R. Feris, M. Turk,
Manifold based analysis of facial expression,
Image and Vision Computing, In press, 2006.