Computer Vision Based Two Hand Gesture Recognition System

Computer Vision Based Two Hand Gesture Recognition System

Ryan Garver


Hand gestures provide a unique and powerful method of controlling computer software. This project started the development of a gesture based interface system. This system based around the TLib image processing library, set out to construct a simple and effcient interface environment allowing mouse control through pointing gestures and commands executed by simple arm movements.


Gestures are heavily used to convey meaning from person to person. Capitalizing on this source of natural control could be the next step in introducing simple, intuitive, and minimally intrusive computer control systems into our every day life. It can be seen in previous examples of gesture recognition systems that the types of gestures that are normally used fall into two categories: pointing, and command. In the case of command oriented systems, the user performs a gesture and the system translates that into a command, or series of commands. Alternately, pointing based systems use frame by frame object tracking as the source of the command. The fundamental difference between these techniques lies in their respective approach to sampling. In order to simplify hand segmentation we are using a stereo camera.

This system is able to handle both classes of gestures. It enters a pointing mode when it recognises a single hand in the image. Alternately it reverts to a command mode when it sees two hands. In pointing mode, the system takes each change in hand position as a command to move the mouse. For command mode, the gesture is divided up into a beginning phase, middle phase, and end phase. The recognition system uses the middle phase as its input source.

The recognition of command gestures is the more interesting portion. We are using a Hidden Markov Model (HMM) technique to train, and then test the gestures. HMMs can be trained to model a particular sequence of data according to a probabilistic algorithm. Once trained, our system will take a segmented gesture (the middle phase), which is represented as a series of integer tuples (three points for each hand), and run it through each HMM. Which ever HMM produces the highest probability of modeling the data sequence, that is the gesture that is performed.