Wide-area Mobile Localization from Panoramic Imagery

Jonathan Ventura and Tobias Höllerer

We describe an end-to-end system for mobile, vision-based localization and tracking in urban environments.  Our system uses panoramic imagery which is processed and indexed to provide localization coverage over a large area using few capture points.  We utilize a client-server model which allows for remote computation and data storage while maintaining real-time tracking performance.  Previous search results are cached and re-used by the mobile client to minimize communication overhead.  We evaluate the use of the system for flexible real-time camera tracking in large outdoor spaces.

Our system produces a model from panoramas captured in the outdoor environment to be tracked.  To prepare input for our modeling procedure, we first extract perspective views from the source panoramas.  We use four orthogonal views to capture the entire horizontal field of view.  Unlike a traditional cube map, we use an `extended cube map' which has increased horizontal field of view for each face.  We also perform vertical and horizontal vanishing point alignment to maximize image resolution of the building facades.

Structure from motion proceeds in a linear fashion, since we assume that the panoramas were taken by moving along a path.  The panoramas are first organized into triplets, independently reconstructed, and then merged into a common scale.  We use the upright constraint to improve pose estimation robustness (1).  After complete reconstruction using SIFT features, we re-triangulate points and perform bundle adjustment using the pyramidal search methods of the PTAM system (2).  We then extract SIFT descriptors for the triangulated FAST corners, to be used for localization.

For localization, the system uses a two-stage approach.  The system first queries a local image cache containing recently seen images together with their known pose. The image-based search is fast, but it is not very robust to changes in rotation or scale.  If the image-based localization fails, the system enqueues the image to be processed using feature-based localization (using vocabulary tree) on a server.  If successful, the localized query image is then added to the tracker’s image cache.

Our results show that even with 500 ms of localization latency, good tracking performance is possible, with greater than 60% of frames tracked in all test videos. With reasonable camera movement, our results suggest it is sufficient to optimize the localization algorithm to return a result within a half a second.

Related Publications
J. Ventura and T. Höllerer.  Outdoor mobile localization from panoramic imagery.  International Symposium on Mixed and Augmented Reality, 2011. [pdf | poster]
(1) J. Ventura and T. Höllerer. Structure and motion in urban environments using upright panoramas. Virtual Reality (under review), 2011.
(2) G. Klein and D. Murray. Parallel tracking and mapping for small ar workspaces. International Symposium on Mixed and Augmented Reality, 2007.