Evaluating and Improving Visual Tracking

Evaluating and Improving Visual Tracking

Steffen Gauglitz  •  Tobias Höllerer  •  Matthew Turk

Visual tracking is a core component for a variety of applications, such as visual navigation of autonomous vehicles and Augmented Reality. We are working on evaluating and improving different algorithms needed for visual tracking.

Datasets & Ground Truth

To methodically evaluate tracking, we need large datasets of relevant data and ground truth information against which the algorithms to be tested can be evaluated.

While there are many existing image/video datasets, most of them have limited validity for visual tracking (e.g. single images instead of videos, no motion blur). The first part of our work therefore was to design an appropriate setup to collect this data. Our setup was first presented at ISMAR 2009 [1].

This image sequence illustrates our process of deriving ground truth: each frame gets warped into a canonical “reference frame”. After our final alignment, the difference between the warped images (right) is very small, indicating accurate alignment and hence accurate ground truth.

Currently, our dataset consists of 96 video streams featuring 6 different textures and 16 different camera paths, with a total of 6889 frames, including ground truth information for each frame. We separated the motions (e.g. rotation only, then zoom only...) to allow for a more detailed analysis – to be able to not only answer the question: how often does tracking break, but also: why or under what condition does it break? Our dataset has been published as part of our IJCV article [2] and is available for download.

Six exemplary frames from our dataset (top row) and how they are warped into the reference frame (bottom row).

Testing Existing Algorithms

With the dataset described on the left, we can evaluate a variety of existing tracking algorithms and individual components of these algorithms. In particular, we extensively evaluated and analyzed keypoint detectors and feature descriptors.

We evaluated 6 popular keypoint detectors, 5 popular feature descriptors/classifiers, and finally, all 30 combinations. This work has been published in the International Journal of Computer Vision [2].

The results can be used to analyze strengths & weaknesses of these algorithms, choose an appropriate algorithm for a particular application, or stimulate ideas for improvements (cf. next section).

Improving Spatial Distribution of Keypoints

We show that improving the spatial distribution of keypoints increases the robustness of visual tracking, and we propose a novel algorithm that can compute such a distribution significantly faster than previous methods.

Keypoints have to fulfill two conflicting criteria: they have to be “strong features” (e.g., corners with high contrast), but they shall also be well-distributed across the image, since this improves robustness of various algorithms that make use of them. Our algorithm efficiently selects a subset of points that fulfill above criteria from a larger set of detected points, and it does so significantly faster than existing methods – in particular, in O(n log n) time instead of O(n^2) time.

This work has been accepted to and will be presented at ICIP 2011 [3].

Step-by-step illustration of our algorithm: we succes-sively “cover” the image with approximated circles. For each circle, one keypoint “survives”. The difficulty lies in finding the right circle size.
Our algorithm "SDC" is significantly faster than the popular selection algorithm ANMS (left graph), but it offers the same improvements in tracking robustness compared to selecting the points by strength only ("top k") (right graph).

Improving Keypoint Orientation Assignments

Keypoint detection, description and matching has proven to be a powerful paradigm for a variety of applications in computer vision. In many frameworks, this paradigm includes an orientation assignment step to make the overall process invariant to in-plane rotation. While this approach seems to work well and is widely accepted, the orientation assignment is frequently presented as a mere “add-on” to a descriptor, and little work has been devoted to the orientation assignment algorithms themselves.

In a paper presented at BMVC 2011 [4], we proposed two novel, very efficient algorithms for orientation assignment (one if a single dominant orientation is sought, and the second capable of detecting multiple dominant orientations), and present a detailed quantitative evaluation and analysis of our two algorithms as well as four competing algorithms. Our results entail observations about the orientation assignment problem in general as well as observations about individual algorithms.

Exemplary results illustrating the performance of our two algorithms CoM and HoI. CoM is very fast to compute and performs favorably among all single-orientation algorithms on corner detectors. HoI can extract multiple orientations and performs similar to SIFT's orientation assignment, but is significantly faster to compute. Please refer to the paper [4] for details.


[1]  A Setup for Evaluating Detectors and Descriptors for Visual Tracking.
S. Gauglitz, T. Höllerer, P. Krahwinkler, J. Roßmann. International Symposium on Mixed and Augmented Reality (ISMAR'09).
[2]  Evaluation of Interest Point Detectors and Feature Descriptors for Visual Tracking.
S. Gauglitz, T. Höllerer, M. Turk. International Journal of Computer Vision (IJCV) 2011.
[3]  Efficiently Selecting Spatially Distributed Keypoints for Visual Tracking.
S. Gauglitz, L. Foschini, M. Turk, T. Höllerer. IEEE International Conference on Image Processing (ICIP) 2011.
[4]  Improving Keypoint Orientation Assignment.
S. Gauglitz, M. Turk, T. Höllerer. British Machine Vision Conference (BMVC) 2011.