3D Hand Pose Reconstruction With ISOSOM

Haiying Guan, Matthew Turk


We present an appearance-based 3D hand posture estimation method that determines a ranked set of possible hand posture candidates from an unmarked hand image, based on an analysis by synthesis method and an image retrieval algorithm. We formulate the posture estimation problem as a nonlinear, many-to-many mapping problem in a high dimension space. A general algorithm called ISOSOM is proposed for nonlinear dimension reduction, applied to 3D hand pose reconstruction to establish the mapping relationships between the hand poses and the image features. In order to interpolate the intermediate posture values given the sparse sampling of ground-truth training data, the geometric map structure of the samples' manifold is generated. The experimental results show that the ISOSOM algorithm performs better than traditional image retrieval algorithms for hand pose estimation.


1.1 Introuction

In this research, we take an image retrieval approach based on analysis by synthesis method. It utilizes a 3D realistic hand model and renders it from different viewpoints to generate synthetic hand images. A set of possible candidates is found by comparing the real hand image with the synthesis images. The ground truth labels of the retrieved matches are used as hand pose candidates. 

The hand is modeled as a 3D articulated object with 21 DOF of the joint angles (hand configuration) and 6 DOF of global rotation and translations. A hand pose is defined by a hand configuration augmented by the 3 DOF global rotation parameters. The main problem of analysis by synthesis is the complexity in such a high dimension space. The size of the synthesis database grows exponentially with respect to the parameter's accuracy. Even though the articulation of the hand is highly constrained, the complexity is still intractable for both database processing and image retrieval. 

We formulate hand pose reconstruction as a nonlinear mapping problem between the angle vectors (hand configurations) and the images. Generally, such mapping is a many-to-many mapping in high dimension space. Due to occlusions, different hand poses could be rendered to the same images. On the other hand, the same pose is rendered from the different view points and generates many images. To simplify the problem, we eliminate the second case by augmenting the hand configuration vector with the 3 global rotation parameters. The mapping from the images to the augmented hand configurations becomes a one-to-many mapping problem between the image space and the augmented hand configuration space (the hand pose space). We establish the one-to-many mapping between the feature space and the hand pose space with the proposed ISOSOM algorithm. The experimental results shows that our algorithm is better than traditional image retrieval algorithms.


Instead of representing each synthesis image by an isolated item in the database, We cluster the similar vectors generated by similar poses together and use the ground-truth samples to generate an organized structure in low dimension space. With such structure, we can interpolate the intermediate vector. This will greatly reduce the complexity. Based on Kohonen's Self-Organizing Map (SOM) and Tenenbaum's ISOMAP algorithm, we propose an ISOmetric Self-Organizing Mapping algorithm (ISOSOM). Instead of organizing the samples in the 2D grids by Euclidian distance, it utilizes the topological graph and geometric distance of the samples' manifold to define the metric relationships between samples and enable the SOM to follow better the topology of the underlying data set. The ISOSOM algorithm compresses information and automatically clusters the training samples in a low dimension space efficiently. Figure 1 gives an intuitive depiction of the ISOSOM map.

Figure 1. The ISOSOM for Hand Pose Reconstruction

1.3 Experimental Results

We compare the performance of the traditional image retrieval algorithm (IR), SOM, and ISOSOM based on synthesis images of the hand pose in Table 1. It shows that the performance of ISOSOM is the best among the three algorithms. Compared to the traditional image retrieval algorithm for the top 40 matches, the hit rates of ISOSOM increase around 13%-31%. This also indicates that the ISOSOM algorithm not only has the clustering ability, but also has interpolation ability. The ISOSOM retrieval results are shown in Figure 2, where the first image is the query image. The rest 20 images are the retrieval results from the ISOSOM neurons.

Figure 2. The ISOSOM retrieval results (The name of each image indicates the hand configuration. The number for query image is the index number in the query dataset. The number for retrieved image is the index number in the ISOSOM neuron graph.)


Haiying Guan and Matthew Turk, "3D hand pose reconstruction with ISOSOM," to appear, International Symposium on Visual Computing, Lake Tahoe, NV, December 5-7, 2005.

Haiying Guan and Matthew Turk, "3D Hand Pose Reconstruction with ISOSOM", Technical Reports 2005-15, Department of Computer Science, University of California, Santa Barbara