Placement of Object Annotations in Video-Based Augmented Reality

Vineet Thanedar and Tobias Hollerer


Augmented reality involves supplementing our view of the real world with virtual or computer-generated information. In numerous applications, annotating objects in the scene can be of immense help. It aids our understanding of the scene in view. However, inefficient, sub-optimal placement of these annotations in the scene can lead to problems such as occlusion of other important objects by the annotations, visual clutter in the scene, or even incoherent placement that makes it difficult to understand the scene. In this project, we attempt to solve the problem of placement of these annotations in a video-based augmented reality setup so as to avoid the problems mentioned above.


Video-based augmented reality is real-time augmentation of video feed from a camera with virtual information. As a start, presently we are working on offline video data to annotate real objects in the video.

Augmenting an object with an annotation involves identifying the object in a frame and tracking it in subsequent frames of the video sequence. To identify the object, a user pauses the video at an arbitrary frame and traces an outline around the outer edge of the object. The outline is marked by clicking at points near the boundary in succession and is a closed contour. We then employ a two-step approach to find the object boundary. Initially, we make use of active contours (snakes) to move the marked points closer to the contour. Active contours are energy minimizing splines which are attracted to object edges under the constraint of internal snake forces and external image forces. A number of different approaches exist for active contours. We utilize a simple and fast algorithm proposed by Williams and Shah [2] for this purpose. Active contours may not always find the object boundary, especially if the snake is initialized far from the object or if the image background is noisy. They are also sensitive to the shape of the object. To further improve the accuracy of edge identification, we apply a simple technique based on intensity gradients in the image that moves each marked point towards the centroid of the object and stops moving the point when it hits the object boundary. This procedure marks the end of the object identification process. A provision is available to eliminate points that do not converge on the object boundary.

Once the object has been identified in the initial frame, it needs to be tracked in subsequent frames. To track the object, we utilize the Kanade-Lucas-Tomasi (KLT) feature tracking approach [3]. The points on the object boundary available at the end of the identification process form the initial features. In addition to these edge features, we also introduce new features inside the object. The features are then tracked in each frame by the KLT feature tracking algorithm. The features that are succesfully tracked in the next frame are termed as good features while the other features are termed as lost features. We reintroduce lost features into the tracking by employing a simple technique based on the motion of the good features. The tracked features provide us information such as the location of the object in the frame and the size of the object. This aids in determining where the annotation to the object should be placed.

We are currently working on determining the placement of the annotation. We are looking at techniques to analyze the current scene image to determine appropriate locations for the annotations. Towards this end, we are presently focussing on identifying homogeneous regions in the image. For e.g., if the scene in view has sky in the background, this may be a good location to place the annotations.


[1] Ronald T. Azuma. A survey of augmented reality. Presence: Teleoperators and Virtual Environments 6, 4 (August 1997), 355 - 385.

[2] D. Williams and M. Shah. A fast algorithm for active contours and curvature estimation. Computer Vision, Graphics and Image Processing, vol. 55, no. 1, 1992.

[3] Jianbo Shi and Carlo Tomasi. Good Features to Track. IEEE Conference on Computer Vision and Pattern Recognition, pages 593-600, 1994.


Related Projects