Outdoor Modeling and Localization for Mobile Augmented Reality

Jonathan Ventura and Tobias Höllerer

We propose a system for modeling an outdoor environment using an omnidirectional camera and then continuously estimating a camera phone's position with respect to the model.  Our system evaluation shows that minimal user effort is required to initialize a camera tracking session in an unprepared urban environment.  We combine panoramas captured using an omnidirectional camera from several viewpoints to create a point cloud model.  After the offline modeling step, live camera pose tracking is initialized by feature point matching, and continuously updated by aligning the point cloud model to the camera image.  In contrast to camera-based simultaneous localization and mapping (SLAM) systems, our methods are suitable for handheld use in large outdoor spaces.  Our vision-based system enables mobile augmented reality applications with better position accuracy and update rate than possible with satellite triangulation.

We map an outdoor environment by simply walking through it while holding up a camera with an omnidirectional lens.  Then we reconstruct a 3D point cloud from a subset of images from the video.  This processing is done on a server machine and takes a few minutes.

The point cloud model and imagery from the video is loaded onto a mobile device -- in our case, the Apple iPad 2.  To first initialize tracking, an image is sent to the server which determines the camera position based on scale-invariant feature matching.  Once the response is received, the device begins tracking the pose of the camera by iteratively updating the correspondence between image and model.

The iPad 2 is capable of realtime (25 fps) tracking performance.  Direct alignment to the point cloud based on panoramic imagery leads to pose jitter, because of poor feature correspondence.  To reduce jitter, we collect images from the current video stream, and match to those images when possible for more accurate pose estimation.

In the video, the 3D point cloud is displayed as red dots.  Note that the camera is free to translate as well as rotate.

Virtual content can then be rendered on top of the video, using full 3D knowledge of the scene.

The following video shows an AR landscaping design application, where virtual trees are placed on the grass.